Server startup with replication¶
In addition to the recovery process described above, the server must take additional steps and precautions if replication is enabled.
Once again the startup procedure is initiated by the
One of the
box.cfg parameters may be
replication that specifies replication
source(-s). We will refer to this replica, which is starting up due to
as the “local” replica to distinguish it from the other replicas in a replica set,
which we will refer to as “distant” replicas.
#1. If there is no snapshot .snap file and the ``replication`` parameter is empty:
then the local replica assumes it is an unreplicated “standalone” instance, or is the first replica of a new replica set. It will generate new UUIDs for itself and for the replica set. The replica UUID is stored in the
_cluster space; the
replica set UUID is stored in the
_schema space. Since a snapshot contains all the
data in all the spaces, that means the local replica’s snapshot will contain the
replica UUID and the replica set UUID. Therefore, when the local replica restarts on
later occasions, it will be able to recover these UUIDs when it reads the .snap
#2. If there is no snapshot .snap file and the ``replication`` parameter is not empty
and the ``_cluster`` space contains no other replica UUIDs:
then the local replica assumes it is not a standalone instance, but is not yet part of a replica set. It must now join the replica set. It will send its replica UUID to the first distant replica which is listed in
replication and which will act as a
master. This is called the “join request”. When a distant replica receives a join
request, it will send back:
- the distant replica’s replica set UUID,
- the contents of the distant replica’s .snap file.
When the local replica receives this information, it puts the replica set UUID in its
_schemaspace, puts the distant replica’s UUID and connection information in its
_clusterspace, and makes a snapshot containing all the data sent by the distant replica. Then, if the local replica has data in its WAL .xlog files, it sends that data to the distant replica. The distant replica will receive this and update its own copy of the data, and add the local replica’s UUID to its
#3. If there is no snapshot .snap file and the ``replication`` parameter is not empty
and the ``_cluster`` space contains other replica UUIDs:
then the local replica assumes it is not a standalone instance, and is already part of a replica set. It will send its replica UUID and replica set UUID to all the distant replicas which are listed in
replication. This is called the “on-connect
handshake”. When a distant replica receives an on-connect handshake:
- the distant replica compares its own copy of the replica set UUID to the one in the on-connect handshake. If there is no match, then the handshake fails and the local replica will display an error.
- the distant replica looks for a record of the connecting instance in its
_clusterspace. If there is none, then the handshake fails.
Otherwise the handshake is successful. The distant replica will read any new information from its own .snap and .xlog files, and send the new requests to the local replica.
In the end … the local replica knows what replica set it belongs to, the distant replica knows that the local replica is a member of the replica set, and both replicas have the same database contents.
#4. If there is a snapshot file and replication source is not empty:
first the local replica goes through the recovery process described in the previous section, using its own .snap and .xlog files. Then it sends a “subscribe” request to all the other replicas of the replica set. The subscribe request contains the server vector clock. The vector clock has a collection of pairs ‘server id, lsn’ for every replica in the
_cluster system space. Each
distant replica, upon receiving a subscribe request, will read its .xlog files’
requests and send them to the local replica if (lsn of .xlog file request) is
greater than (lsn of the vector clock in the subscribe request). After all the
other replicas of the replica set have responded to the local replica’s subscribe
request, the replica startup is complete.
The following temporary limitations apply for versions 1.7 and 2.0:
- The URIs in the
replicationparameter should all be in the same order on all replicas. This is not mandatory but is an aid to consistency.
- The maximum number of entries in the
_clusterspace is 32. Tuples for out-of-date replicas are not automatically re-used, so if this 32-replica limit is reached, users may have to reorganize the