Posted by : Sushanth Friday, 7 January 2022

 

Consistency

Imagine a key-value store which needs to be stored on multiple servers. Since the same data is stored on multiple servers its important to maintain consistency for read/write operations.

Definitions:

N = The number of replicas

 W = A write quorum of size W. For a write operation to be considered as successful, write operation must be acknowledged from W replicas.

 R = A read quorum of size R. For a read operation to be considered as successful, read operation must wait for responses from at least R replicas.

 Consider the below example shown in Figure with N = 3.

W = 1 means that the coordinator must receive at least one acknowledgment before the write operation is considered as successful. For instance, if we get an acknowledgment from s1, we no longer need to wait for acknowledgements from s0 and s2.

 A coordinator acts as a proxy between the client and the nodes.

The configuration of W, R and N is a typical trade-off between latency and consistency.

 If W = 1 or R = 1, an operation is returned quickly because a coordinator only needs to wait for a response from any of the replicas.

 If W or R > 1, the system offers better consistency, however, the query will be slower because the coordinator must wait for the response from the slowest replica.

Possible setups:

 If R = 1 and W = N, the system is optimized for a fast read.

 If W = 1 and R = N, the system is optimized for fast write.

If W + R > N, strong consistency is guaranteed (Usually N = 3, W = R = 2).

 If W + R <= N, strong consistency is not guaranteed.

Depending on the requirement, we can tune the values of W, R, N to achieve the desired level of consistency.

Consistency Model:

Consistency model is other important factor to consider when designing such a data store. A consistency model defines the degree of data consistency:

 • Strong consistency: any read operation returns a value corresponding to the result of the most updated write data item. A client never sees out-of-date data.

• Weak consistency: subsequent read operations may not see the most updated value.

• Eventual consistency: this is a specific form of weak consistency. Given enough time, all updates are propagated, and all replicas are consistent.

Strong consistency is usually achieved by forcing a replica not to accept new reads/writes until every replica has agreed on current write. This approach is not ideal for highly available systems because it could block new operations. Dynamo and Cassandra adopt eventual consistency, which is our recommended consistency model for our key-value store. From concurrent writes, eventual consistency allows inconsistent values to enter the system and force the client to read the values to reconcile.

Inconsistency resolution: Versioning

 Replication gives high availability but causes inconsistencies among replicas. Versioning and vector locks are used to solve inconsistency problems. Versioning means treating each data modification as a new immutable version of data.

How inconsistency happens?

As shown in Figure below, both replica nodes n1 and n2 have the same value. Server 1 and server 2 get the same value for get(“name”) operation.



Now if server 1 changes the name to “testjohn1”, and server 2 changes the name to “testjohn2” as shown in Figure below. These two changes are performed simultaneously. Now, we have conflicting values, called versions v1 and v2.

In this example, the original value could be ignored because the modifications were based on it. However, there is no clear way to resolve the conflict of the last two versions. To resolve this issue, we need a versioning system that can detect conflicts and reconcile conflicts.

Vector Clock 

A vector clock is a common technique to solve this problem.

A vector clock is a [server, version] pair associated with a data item. It can be used to check if one version precedes, succeeds, or in conflict with others.

Assume a vector clock is represented by D ([S1, v1], [S2, v2], …, [Sn, vn]), where D is a data item, v1 is a version counter, and s1 is a server number, etc. If data item D is written to server Si, the system must perform one of the following tasks. 

• Increment vi if [Si, vi] exists.

• Otherwise, create a new entry [Si, 1].



1. A client writes a data item D1 to the system, and the write is handled by server Sx, which now has the vector clock D1[(Sx, 1)].

2. Another client reads the latest D1, updates it to D2, and writes it back. D2 descends from D1 so it overwrites D1. Assume the write is handled by the same server Sx, which now has vector clock D2([Sx, 2]).

3. Another client reads the latest D2, updates it to D3, and writes it back. Assume the write is handled by server Sy, which now has vector clock D3([Sx, 2], [Sy, 1])).

4. Another client reads the latest D2, updates it to D4, and writes it back. Assume the write is handled by server Sz, which now has D4([Sx, 2], [Sz, 1])).

5. When another client reads D3 and D4, it discovers a conflict, which is caused by data item D2 being modified by both Sy and Sz. The conflict is resolved by the client and updated data is sent to the server. Assume the write is handled by Sx, which now has D5([Sx, 3], [Sy, 1], [Sz, 1]).

Downsides:

1.vector clocks add complexity to the client because it needs to implement conflict resolution logic.

2. The [server: version] pairs in the vector clock could grow rapidly. To fix this problem, we set a threshold for the length, and if it exceeds the limit, the oldest pairs are removed. This can lead to inefficiencies in reconciliation because the descendant relationship cannot be determined accurately.


Leave a Reply

Subscribe to Posts | Subscribe to Comments

- Copyright © Technical Articles - Skyblue - Powered by Blogger - Designed by Johanes Djogan -