sklaha opened a new pull request, #153:
URL: https://github.com/apache/cassandra-analytics/pull/153
### Description
The bulk reader supports various consistency levels including LOCAL_QUORUM,
but it does not support EACH_QUORUM consistency. This limitation creates a
significant problem for customers who need globally consistent data in their
analytics workloads. The changes in the PR addresses this limitation by
implementing EACH_QUORUM support in bulk reader.
### Read SS Tables in EACH_QUORUM consistency
* Split the keyspace token range between spark partition
* For each spark partition:
* Get the Cassandra replicas responsible for storing the token range
* If the replication factor is each quorum: Calculate the number of
replicas needed for quorum in each data center and prepare a Map<Datacenter,
Number of replicas>
* Do following for each datacenter:
* Sort the list of replicas based on availability hint from snapshot
request. Available replicas are in the beginning.
* Split the list of replicas in 2 sets:
* Primary replicas: First N replicas from the list
* Secondary replicas: Remaining replicas in the list
* Read SS tables from N replicas or fail:
* Try to read from a primary replica.
* If read fails for a primary replica, try a secondary replica.
* Compact the SS tables
### Integration Testing Strategy
#### Test Setup
* Cassandra cluster:
* DataCenter1 : [Node1, Node2, Node3], RF=3
* DataCenter2 : [Node4, Node5, Node6], RF=3
* Replication strategy:
{
'class' : 'NetworkTopologyStrategy',
'datacenter1' : 3,
'datacenter2' : 3
}
* Read repair disabled.
* Auto repair disabled.
#### Test Cases
* TC#1: Happy Path, ALL==QUORUM==EACH_QUORUM
* Write [K1, V1] with ALL consistency level
* Bulk read separately with ALL, QUORUM and EACH_QUORUM
* Use driver to read separately with ALL, QUORUM and EACH_QUORUM
* All bulk reads and driver reads should return the same data
* TC#2: QUORUM != EACH_QUORUM
* Write [K1, V1] with ALL consistency level
* Write [K1, V2] locally to only Node5, Node6
* Bulk read separately with ALL, QUORUM and EACH_QUORUM
* Use driver to read separately with ALL, QUORUM and EACH_QUORUM
* ALL should return [K1, V2], QUORUM should return [K1, V1], EACH_QUORUM
should return [K1, V2]
* Driver results should match bulk read results
* TC#3: EACH_QUORUM succeeds with single down nodes in each DC
* Write [K1, V1] with ALL consistency level
* Shut down Node1 and Node4
* Bulk read with EACH_QUORUM
* Use driver to read with EACH_QUORUM
* Bulk read should succeed and return [K1, V1]
* Driver results should match bulk read results
* TC#4: QUORUM succeeds but EACH_QUORUM fails with multiple down nodes in
one DC
* Write [K1, V1] with ALL consistency level
* Shut down Node5 and Node6
* Bulk read separately with QUORUM and EACH_QUORUM
* Use driver to read with QUORUM and EACH_QUORUM
* QUORUM should succeed with [K1, V1] result
* EACH_QUORUM should fail with NotEnoughReplicas error.
* Driver results should match bulk read results
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]