yifan-c commented on code in PR #93: URL: https://github.com/apache/cassandra-analytics/pull/93#discussion_r2078084519
########## cassandra-analytics-integration-tests/src/test/java/org/apache/cassandra/analytics/testcontainer/BulkWriteS3CompatModeSimpleMultipleTokensTestImpl.java: ########## @@ -0,0 +1,12 @@ +package org.apache.cassandra.analytics.testcontainer; + +import org.apache.cassandra.testing.ClusterBuilderConfiguration; + +public class BulkWriteS3CompatModeSimpleMultipleTokensTestImpl extends BulkWriteS3CompatModeSimpleTest { Review Comment: maybe rename to `BulkWriteS3CompatModeSimpleMultipleTokensTest`? The ending `Impl` is probably added by mistake. ########## cassandra-analytics-integration-tests/src/test/java/org/apache/cassandra/analytics/CassandraAnalyticsSimpleTest.java: ########## @@ -19,29 +19,25 @@ package org.apache.cassandra.analytics; -import java.util.Arrays; -import java.util.HashMap; -import java.util.List; -import java.util.Map; -import java.util.function.Function; -import java.util.stream.Stream; - -import org.junit.jupiter.api.Timeout; -import org.junit.jupiter.params.ParameterizedTest; -import org.junit.jupiter.params.provider.Arguments; -import org.junit.jupiter.params.provider.MethodSource; - import org.apache.cassandra.sidecar.testing.QualifiedName; import org.apache.cassandra.spark.bulkwriter.WriterOptions; import org.apache.cassandra.testing.ClusterBuilderConfiguration; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; +import org.junit.jupiter.api.Timeout; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +import java.util.Arrays; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.function.Function; +import java.util.stream.Stream; -import static org.apache.cassandra.testing.TestUtils.CREATE_TEST_TABLE_STATEMENT; -import static org.apache.cassandra.testing.TestUtils.DC1_RF3; -import static org.apache.cassandra.testing.TestUtils.ROW_COUNT; -import static org.apache.cassandra.testing.TestUtils.TEST_KEYSPACE; +import static org.apache.cassandra.testing.TestUtils.*; Review Comment: codestyle nit: the project prefers explicit imports over wildcard import. ########## cassandra-analytics-integration-tests/src/test/java/org/apache/cassandra/analytics/BulkWriteDownSidecarMultipleTokensTest.java: ########## @@ -0,0 +1,12 @@ +package org.apache.cassandra.analytics; + +import org.apache.cassandra.testing.ClusterBuilderConfiguration; + +public class BulkWriteDownSidecarMultipleTokensTest extends BulkWriteDownSidecarTest { + @Override + protected ClusterBuilderConfiguration testClusterConfiguration() + { + return super.testClusterConfiguration() + .tokenCount(4); Review Comment: nit: align with `.` ```suggestion return super.testClusterConfiguration() .tokenCount(4); ``` ########## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/ClientConfig.java: ########## @@ -53,6 +53,11 @@ public class ClientConfig public static final String SNAPSHOT_NAME_KEY = "snapshotName"; public static final String DC_KEY = "dc"; public static final String CREATE_SNAPSHOT_KEY = "createSnapshot"; + /** + * Option to filter distinct instances before creating snapshots. This is only applicable when + * using vnodes where the token ring will contain multiple entries per instance. + */ + public static final String CREATE_SNAPSHOT_FILTER_DISTINCT_INSTANCES_KEY = "createSnapshotFilterDistinctInstances"; Review Comment: IMO, a flag is wanted, when (1) there is nothing in the system to derive a condition (that alters behavior) and it requires an external signal, or (2) explicitness is _required_. I think in this case, if we can reliably derive the condition from the ring response, it would void the need for the flag. Beside, it might cause confusion when cluster is running vnode but `createSnapshotFilterDistinctInstances` is set to `false`. I think the expectation is to only create snapshot once on each physical host always. If true, the flag is also unnecessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org