- Back to Home »
- Consistency
Consistency
Imagine a key-value store which needs to be stored on multiple servers. Since
the same data is stored on multiple servers its important to maintain
consistency for read/write operations.
Definitions:
N = The number of
replicas
Consider the below example shown
in Figure with N = 3.
W = 1 means that the coordinator must receive at least one
acknowledgment before the write operation is considered as successful. For
instance, if we get an acknowledgment from s1, we no longer need to wait for
acknowledgements from s0 and s2.
A coordinator acts as a proxy
between the client and the nodes.
The configuration of W, R and N is a typical trade-off between latency
and consistency.
If W = 1 or R = 1, an operation
is returned quickly because a coordinator only needs to wait for a response
from any of the replicas.
If W or R > 1, the system
offers better consistency, however, the query will be slower because the
coordinator must wait for the response from the slowest replica.
Possible setups:
If R = 1 and W = N, the system is
optimized for a fast read.
If W = 1 and R = N, the system is
optimized for fast write.
If W + R > N, strong consistency is guaranteed (Usually N = 3, W = R
= 2).
If W + R <= N, strong
consistency is not guaranteed.
Depending on the requirement, we can tune the values of W, R, N to
achieve the desired level of consistency.
Consistency Model:
Consistency model is other important factor to consider when designing such
a data store. A consistency model defines the degree of data consistency:
• Strong consistency: any read operation
returns a value corresponding to the result of the most updated write data
item. A client never sees out-of-date data.
• Weak consistency: subsequent read
operations may not see the most updated value.
• Eventual
consistency: this is a specific form of weak consistency. Given
enough time, all updates are propagated, and all replicas are consistent.
Strong consistency is usually achieved by forcing a replica not to
accept new reads/writes until every replica has agreed on current write. This
approach is not ideal for highly available systems because it could block new
operations. Dynamo and Cassandra adopt eventual consistency, which is our
recommended consistency model for our key-value store. From concurrent writes,
eventual consistency allows inconsistent values to enter the system and force
the client to read the values to reconcile.
Inconsistency resolution: Versioning
Replication gives high
availability but causes inconsistencies among replicas. Versioning and vector
locks are used to solve inconsistency problems. Versioning means treating each
data modification as a new immutable version of data.
How inconsistency
happens?
As shown in Figure below, both replica nodes n1 and n2 have the same
value. Server 1 and server 2 get the same value for get(“name”) operation.
Now if server 1 changes the name to “testjohn1”, and server 2 changes the
name to “testjohn2” as shown in Figure below. These two changes are performed
simultaneously. Now, we have conflicting values, called versions v1 and v2.
In this example, the original value could be ignored because the
modifications were based on it. However, there is no clear way to resolve the
conflict of the last two versions. To resolve this issue, we need a versioning
system that can detect conflicts and reconcile conflicts.
Vector Clock
A vector clock is a common
technique to solve this problem.
A vector clock is a [server, version] pair associated with a data item.
It can be used to check if one version precedes, succeeds, or in conflict with
others.
Assume a vector clock is represented by D ([S1, v1], [S2, v2], …, [Sn,
vn]), where D is a data item, v1 is a version counter, and s1 is a server
number, etc. If data item D is written to server Si, the system must perform
one of the following tasks.
• Increment vi if [Si, vi] exists.
• Otherwise, create a new entry [Si, 1].
1. A client writes a data item D1 to the system, and the write is
handled by server Sx, which now has the vector clock D1[(Sx, 1)].
2. Another client reads the latest D1, updates it to D2, and writes it
back. D2 descends from D1 so it overwrites D1. Assume the write is handled by
the same server Sx, which now has vector clock D2([Sx, 2]).
3. Another client reads the latest D2, updates it to D3, and writes it
back. Assume the write is handled by server Sy, which now has vector clock
D3([Sx, 2], [Sy, 1])).
4. Another client reads the latest D2, updates it to D4, and writes it
back. Assume the write is handled by server Sz, which now has D4([Sx, 2], [Sz,
1])).
5. When another client reads D3 and D4, it discovers a conflict, which
is caused by data item D2 being modified by both Sy and Sz. The conflict is
resolved by the client and updated data is sent to the server. Assume the write
is handled by Sx, which now has D5([Sx, 3], [Sy, 1], [Sz, 1]).
Downsides:
1.vector clocks add complexity to the client because it needs to
implement conflict resolution logic.
2. The [server: version] pairs in the
vector clock could grow rapidly. To fix this problem, we set a threshold for
the length, and if it exceeds the limit, the oldest pairs are removed. This can
lead to inefficiencies in reconciliation because the descendant relationship
cannot be determined accurately.