[
https://issues.apache.org/jira/browse/CASSANALYTICS-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sudipta Laha updated CASSANALYTICS-79:
--------------------------------------
Change Category: Operability
Complexity: Challenging
Component/s: Reader
Reviewers: Doug Rohrer, Francisco Guerrero, Saranya Krishnakumar
Status: Open (was: Triage Needed)
> Make Token ranges calculation rack aware for
> spark.data.partitioner.CassandraRing
> ---------------------------------------------------------------------------------
>
> Key: CASSANALYTICS-79
> URL: https://issues.apache.org/jira/browse/CASSANALYTICS-79
> Project: Apache Cassandra Analytics
> Issue Type: Improvement
> Components: Reader
> Reporter: Liu Cao
> Assignee: Sudipta Laha
> Priority: Normal
> Time Spent: 20m
> Remaining Estimate: 0h
>
> discussion context
> [https://github.com/apache/cassandra-analytics/pull/93#issuecomment-3099771451]
>
> Per the [comment|#L61], the token range calculation used in Cassandra
> analytics bulk read data source is not rack aware. This leads to incorrect
> pick of replication target when reading SSTables.
>
> Calculation logic is
> [here|https://github.com/apache/cassandra-analytics/blob/6e1d5257a8d6c910a42751475612145533ae3b1d/cassandra-analytics-common/src/main/java/org/apache/cassandra/spark/utils/RangeUtils.java#L158]
>
> Concrete example of breakage:
> We have a cluster with 6 nodes with vnode=16, thereby creating a list of 96
> instances.
> These 6 nodes reside in 3 different Availability Zones of the same AWS region
> (2 nodes in each AZ). In the gist below, node UUIDs are replaced with
> human-readable identifier for clarity - us-west-2a-node1, us-west-2a-node2,
> etc.
> The replication factor for the keyspace is 3 and we use
> NetworkTopologyStrategy.
> [https://gist.github.com/liucao-dd/542eb0868d2e080733ca3fe127719114]
>
> 1-indexed entry -
> {code:java}
> {"token"="-9067202222264017285", "node"="us-west-2b-node1",
> "dc"="us-west-2"}{code}
> (1 + 96 - 3) % 96 = 94, looking at the 94-indexed entry -
> {code:java}
> {"token"="8821921454609098249", "node"="us-west-2b-node2",
> "dc"="us-west-2"}{code}
> , indicating this vnode on {{us-west-2b-node1}} holds replica / data for
> token range {{(8821921454609098249, MAX]}} and {{(MIN,
> -9067202222264017285);}}
>
> 2-indexed entry
> {code:java}
> {"token"="-8862464739686088316", "node"="us-west-2b-node2",
> "dc"="us-west-2"}{code}
> (2 + 96 - 3) % 96 = 95, looking at the 95-indexed entry -
> {code:java}
> {"token"="8957072497331100633", "node"="us-west-2c-node1",
> "dc"="us-west-2"}{code}
> indicating this vnode on {{us-west-2b-node2}} holds replica / data for the
> token range {{(8957072497331100633, MAX]}} and {{{}(MIN,
> -8862464739686088316){}}}.
> Now, looking at the token range {{(8957072497331100633, MAX]}} specifically.
> We should have this token range replicates in 1 node in each of the AZ.
> However, both {{us-west-2b-node1}} and {{{}us-west-2b-node2{}}}, in the same
> AZ (rack) now both hold replica / data for it. This contradicts with the
> rack-aware token placement where us-west-2a and us-west-2c would each hold 1
> replica for this token range (RF=3 with 3 AZ/racks). In reality, the
> us-west-2b-node2 does not hold replica data for it.
>
> This issue is not limited to vnode, but a generic problem when we have rack
> aware placement of token ring.
>
> One solution is to utilize the sidecar's {{/api/v1/token-range-replicas}}
> endpoint which directly retrieves the token ranges and corresponding replica
> from sidecar.
>
> Alternatively, given we already calls the ring endpoint
> `api/v1/cassandra/ring`, its response includes the rack information for each
> node already. We can consider adding an optional new attribute to the class
> CassandraInstance and calculate the token range according to the replication
> strategy as well, following standard cassandra logic such as those specified
> in
> [https://github.com/datastax/python-driver/blob/0979b897549de4578eda31dfd9e1e1a2f080c926/cassandra/metadata.py#L581]
> or
> [https://github.com/apache/cassandra-java-driver/blob/17ebe6092e2877d8c524e07489c4c3d005cfeea5/core/src/main/java/com/datastax/oss/driver/internal/core/metadata/token/NetworkTopologyReplicationStrategy.java#L59]
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]