[
https://issues.apache.org/jira/browse/CASSANDRA-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882427#comment-17882427
]
Michael Semb Wever edited comment on CASSANDRA-13704 at 9/17/24 2:42 PM:
-------------------------------------------------------------------------
I'm seeing failures in 5.0
{noformat}
failed on teardown with "Unexpected error found in node logs (see stdout for
full details). Errors: [[node1] 'ERROR [MutationStage-1] 2024-09-17
13:03:31,667 JVMStabilityInspector.java:70 - Exception in thread
Thread[MutationStage-1,10,SharedPool]
java.lang.IllegalArgumentException: Conflicting replica added (expected unique
endpoints): Full(/127.0.0.3:7000,(4611686018427387904,0]); existing:
Full(/127.0.0.3:7000,(281474976710656,0])
at
org.apache.cassandra.locator.EndpointsForToken$Builder.add(EndpointsForToken.java:102)
at
org.apache.cassandra.locator.EndpointsForToken$Builder.add(EndpointsForToken.java:79)
at
org.apache.cassandra.locator.ReplicaCollection$Builder.addAll(ReplicaCollection.java:160)
at
org.apache.cassandra.locator.ReplicaCollection$Builder.addAll(ReplicaCollection.java:166)
at
org.apache.cassandra.locator.EndpointsForToken.copyOf(EndpointsForToken.java:162)
at
org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalAndPendingReplicasForToken(AbstractReplicationStrategy.java:126)
at
org.apache.cassandra.service.StorageService.isEndpointValidForWrite(StorageService.java:5162)
at
org.apache.cassandra.db.AbstractMutationVerbHandler.isOutOfRangeMutation(AbstractMutationVerbHandler.java:79)
at
org.apache.cassandra.db.AbstractMutationVerbHandler.processMessage(AbstractMutationVerbHandler.java:53)
at
org.apache.cassandra.db.AbstractMutationVerbHandler.doVerb(AbstractMutationVerbHandler.java:44)
at
org.apache.cassandra.db.ReadRepairVerbHandler.doVerb(ReadRepairVerbHandler.java:35)
at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
at
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
at
org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:143)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)at
java.base/java.lang.Thread.run(Thread.java:829)']"
{noformat}
in
- dtest-novnode jdk11 57/64 /
dtest-novnode.consistent_bootstrap_test.TestBootstrapConsistency.test_consistent_reads_after_move
- dtest-latest jdk11 43/64 /
dtest-latest.materialized_views_test.TestMaterializedViews.test_insert_during_range_movement_rf3
was (Author: michaelsembwever):
I'm seeing failures in 5.0
{noformat}
failed on teardown with "Unexpected error found in node logs (see stdout for
full details). Errors: [[node1] 'ERROR [MutationStage-1] 2024-09-17
13:03:31,667 JVMStabilityInspector.java:70 - Exception in thread
Thread[MutationStage-1,10,SharedPool]
java.lang.IllegalArgumentException: Conflicting replica added (expected unique
endpoints): Full(/127.0.0.3:7000,(4611686018427387904,0]); existing:
Full(/127.0.0.3:7000,(281474976710656,0])
at
org.apache.cassandra.locator.EndpointsForToken$Builder.add(EndpointsForToken.java:102)
at
org.apache.cassandra.locator.EndpointsForToken$Builder.add(EndpointsForToken.java:79)
at
org.apache.cassandra.locator.ReplicaCollection$Builder.addAll(ReplicaCollection.java:160)
at
org.apache.cassandra.locator.ReplicaCollection$Builder.addAll(ReplicaCollection.java:166)
at
org.apache.cassandra.locator.EndpointsForToken.copyOf(EndpointsForToken.java:162)
at
org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalAndPendingReplicasForToken(AbstractReplicationStrategy.java:126)
at
org.apache.cassandra.service.StorageService.isEndpointValidForWrite(StorageService.java:5162)
at
org.apache.cassandra.db.AbstractMutationVerbHandler.isOutOfRangeMutation(AbstractMutationVerbHandler.java:79)
at
org.apache.cassandra.db.AbstractMutationVerbHandler.processMessage(AbstractMutationVerbHandler.java:53)
at
org.apache.cassandra.db.AbstractMutationVerbHandler.doVerb(AbstractMutationVerbHandler.java:44)
at
org.apache.cassandra.db.ReadRepairVerbHandler.doVerb(ReadRepairVerbHandler.java:35)
at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
at
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
at
org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:143)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)at
java.base/java.lang.Thread.run(Thread.java:829)']"
{noformat}
> Safer handling of out of range tokens
> -------------------------------------
>
> Key: CASSANDRA-13704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13704
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Coordination, Legacy/Observability
> Reporter: Sam Tunnicliffe
> Assignee: Caleb Rackliffe
> Priority: Urgent
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
> Attachments: CASSANDRA-13704_5-0_23_ci_summary.html,
> CASSANDRA-13704_5-0_23_results_details.tar.xz,
> CASSANDRA-13704_5-0_24_ci_summary.html,
> CASSANDRA-13704_5-0_24_results_details.tar.xz, ci_summary-1.html,
> ci_summary-2.html, ci_summary.html, result_details.tar-1.gz,
> result_details.tar-2.gz, result_details.tar.gz
>
> Time Spent: 7h 10m
> Remaining Estimate: 0h
>
> It is possible for nodes to have a divergent view of the ring, which can
> result in some operations being sent to the wrong nodes. This is an umbrella
> ticket to mitigate such issues by adding logging when a node is asked to
> perform an operation for tokens it does not own. This will be useful for
> detecting when the nodes' views of the ring diverge, which is not highly
> visible at the moment, and also for post-hoc analysis.
> It may also be beneficial to straight up reject certain operations, though
> this will need to balance the risk of performing those ops against the
> consequences rejecting them has on availability.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]