[ 
https://issues.apache.org/jira/browse/CASSANALYTICS-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudipta Laha updated CASSANALYTICS-79:
--------------------------------------
    Change Category: Operability
         Complexity: Challenging
        Component/s: Reader
          Reviewers: Doug Rohrer, Francisco Guerrero, Saranya Krishnakumar
             Status: Open  (was: Triage Needed)

> Make Token ranges calculation rack aware for 
> spark.data.partitioner.CassandraRing
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANALYTICS-79
>                 URL: https://issues.apache.org/jira/browse/CASSANALYTICS-79
>             Project: Apache Cassandra Analytics
>          Issue Type: Improvement
>          Components: Reader
>            Reporter: Liu Cao
>            Assignee: Sudipta Laha
>            Priority: Normal
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> discussion context 
> [https://github.com/apache/cassandra-analytics/pull/93#issuecomment-3099771451]
>  
> Per the [comment|#L61], the token range calculation used in Cassandra 
> analytics bulk read data source is not rack aware. This leads to incorrect 
> pick of replication target when reading SSTables.
>  
> Calculation logic is 
> [here|https://github.com/apache/cassandra-analytics/blob/6e1d5257a8d6c910a42751475612145533ae3b1d/cassandra-analytics-common/src/main/java/org/apache/cassandra/spark/utils/RangeUtils.java#L158]
>  
> Concrete example of breakage:
> We have a cluster with 6 nodes with vnode=16, thereby creating a list of 96 
> instances.
> These 6 nodes reside in 3 different Availability Zones of the same AWS region 
> (2 nodes in each AZ). In the gist below, node UUIDs are replaced with 
> human-readable identifier for clarity - us-west-2a-node1, us-west-2a-node2, 
> etc.
> The replication factor for the keyspace is 3 and we use 
> NetworkTopologyStrategy.
> [https://gist.github.com/liucao-dd/542eb0868d2e080733ca3fe127719114]
>  
> 1-indexed entry -
> {code:java}
>  {"token"="-9067202222264017285", "node"="us-west-2b-node1", 
> "dc"="us-west-2"}{code}
> (1 + 96 - 3) % 96 = 94, looking at the 94-indexed entry -
> {code:java}
> {"token"="8821921454609098249", "node"="us-west-2b-node2", 
> "dc"="us-west-2"}{code}
> , indicating this vnode on {{us-west-2b-node1}} holds replica / data for 
> token range {{(8821921454609098249, MAX]}} and {{(MIN, 
> -9067202222264017285);}}
>  
> 2-indexed entry
> {code:java}
> {"token"="-8862464739686088316", "node"="us-west-2b-node2", 
> "dc"="us-west-2"}{code}
> (2 + 96 - 3) % 96 = 95, looking at the 95-indexed entry - 
> {code:java}
> {"token"="8957072497331100633", "node"="us-west-2c-node1", 
> "dc"="us-west-2"}{code}
> indicating this vnode on {{us-west-2b-node2}} holds replica / data for the 
> token range {{(8957072497331100633, MAX]}} and {{{}(MIN, 
> -8862464739686088316){}}}.
> Now, looking at the token range {{(8957072497331100633, MAX]}} specifically. 
> We should have this token range replicates in 1 node in each of the AZ. 
> However, both {{us-west-2b-node1}} and {{{}us-west-2b-node2{}}}, in the same 
> AZ (rack) now both hold replica / data for it. This contradicts with the 
> rack-aware token placement where us-west-2a and us-west-2c would each hold 1 
> replica for this token range (RF=3 with 3 AZ/racks). In reality, the 
> us-west-2b-node2 does not hold replica data for it.
>  
> This issue is not limited to vnode, but a generic problem when we have rack 
> aware placement of token ring.
>  
> One solution is to utilize the sidecar's {{/api/v1/token-range-replicas}} 
> endpoint which directly retrieves the token ranges and corresponding replica 
> from sidecar.
>  
> Alternatively, given we already calls the ring endpoint 
> `api/v1/cassandra/ring`, its response includes the rack information for each 
> node already. We can consider adding an optional new attribute to the class 
> CassandraInstance and calculate the token range according to the replication 
> strategy as well, following standard cassandra logic such as those specified 
> in 
> [https://github.com/datastax/python-driver/blob/0979b897549de4578eda31dfd9e1e1a2f080c926/cassandra/metadata.py#L581]
>  or 
> [https://github.com/apache/cassandra-java-driver/blob/17ebe6092e2877d8c524e07489c4c3d005cfeea5/core/src/main/java/com/datastax/oss/driver/internal/core/metadata/token/NetworkTopologyReplicationStrategy.java#L59]
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to