[
https://issues.apache.org/jira/browse/CASSANDRA-21189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061011#comment-18061011
]
Sam Lightfoot edited comment on CASSANDRA-21189 at 2/25/26 5:20 PM:
--------------------------------------------------------------------
The triggering error that causes a chain of port errors is from a Paxos commit
that times out:
{code:java}
Caused an ERROR
[2026-02-25T09:27:27.026Z] [junit-timeout]
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Can
not commit transformation: "SERVER_ERROR"(Could not perform commit; policy
Retry{remainingMs=0, attempts=2} gave up). {code}
This timeout is configured on the cluster builder to 1 second (overriding from
10s default)
{code:java}
try (Cluster cluster = builder().withNodes(3)
.appendConfig(cfg ->
cfg.set("progress_barrier_timeout", "5000ms")
.set("request_timeout", "1000ms")
.set("progress_barrier_backoff", "100ms")
{ {code}
The request_timeout effectively becomes the ceiling for the entire Paxos
commit, and because a successful error response is returned, it does not get
retried within the cms_await_timeout budget (significantly larger).
I think a fairly safe option is to increase the 1000ms request_timeout from the
three tests where it is set, or remove it completely, given the resource
constraints of CI.
was (Author: JIRAUSER302824):
Appears to be due to FailedBootstrapTest not cleaning up properly, with
InProgressSequenceCoordinationTest starting immediately after. Running these
two tests sequentially reproduces the issue.
> Fix flaky DTest: InProgressSequenceCoordinationTest
> ---------------------------------------------------
>
> Key: CASSANDRA-21189
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21189
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Test/dtest/java
> Reporter: Sam Lightfoot
> Assignee: Sam Lightfoot
> Priority: Normal
> Fix For: 5.1
>
>
> There's a race condition between cluster closing and startup between test
> scenarios due to lack of thread lifecycle handling. The spawned thread should
> be joined before the test finishes to prevent the 'in-use port' errors.
> Affects
> * bootstrapProgressTest
> * decommissionProgressTest
> * replacementProgressTest
> Adopt the same pattern as GossipTest with try-finally thread joining.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]