yifan-c commented on code in PR #93: URL: https://github.com/apache/cassandra-analytics/pull/93#discussion_r2078562579
########## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/ClientConfig.java: ########## @@ -53,6 +53,11 @@ public class ClientConfig public static final String SNAPSHOT_NAME_KEY = "snapshotName"; public static final String DC_KEY = "dc"; public static final String CREATE_SNAPSHOT_KEY = "createSnapshot"; + /** + * Option to filter distinct instances before creating snapshots. This is only applicable when + * using vnodes where the token ring will contain multiple entries per instance. + */ + public static final String CREATE_SNAPSHOT_FILTER_DISTINCT_INSTANCES_KEY = "createSnapshotFilterDistinctInstances"; Review Comment: > We could leverage the [NodeSettings#tokens](https://github.com/apache/cassandra-sidecar/blob/trunk/client-common/src/main/java/org/apache/cassandra/sidecar/common/response/NodeSettings.java#L48) to make the automatic determination. @frankgh and I talked offline and I am re-posting the summary. Let's prefer deriving vnode from ring response. The problem of `nodesettings#tokens` is that it picks from a single sidecar instance. Cassandra does not enforce that all nodes in the cluster should configure vnodes. In other words, some nodes can be configured to run single token, but others with vnodes. If relying on the tokens count from a random single node, it could provide the misinformation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org