[
https://issues.apache.org/jira/browse/CASSANDRA-21185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061946#comment-18061946
]
Sam Lightfoot commented on CASSANDRA-21185:
-------------------------------------------
I ran the change to unconditionally add the seeds nodes through CI and the
bootstrap tests still fail but at a later stage due to duplicate token
allocation issues:
{panel:title=Token Allocation Race (local run, node C)}
{{ERROR [main] 2026-02-28T12:03:40,195 CassandraDaemon.java:1017 - Exception
encountered during startup
java.lang.IllegalStateException: Can not commit transformation: "INVALID"
(Rejecting this plan as some tokens are already assigned:
[-2659780865256175721 (node 2|/127.0.0.2:7000),
-4436269193111982995 (node 2|/127.0.0.2:7000),
-7126313408081543240 (node 2|/127.0.0.2:7000),
-4009982691830853605 (node 2|/127.0.0.2:7000),
1777711142818670636 (node 2|/127.0.0.2:7000),
8757199486592871928 (node 2|/127.0.0.2:7000),
-8337993754753524387 (node 2|/127.0.0.2:7000),
-1602404348327217504 (node 2|/127.0.0.2:7000),
-8753631414325164267 (node 2|/127.0.0.2:7000),
519290551675017603 (node 2|/127.0.0.2:7000),
6539120763084666461 (node 2|/127.0.0.2:7000),
-6728990555732830449 (node 2|/127.0.0.2:7000),
-5470569839787387361 (node 2|/127.0.0.2:7000),
7708075686263680840 (node 2|/127.0.0.2:7000),
2842052132591734033 (node 2|/127.0.0.2:7000),
4662056433571680432 (node 2|/127.0.0.2:7000)])
at
o.a.c.tcm.ClusterMetadataService.lambda$commit$6(ClusterMetadataService.java:581)
at
o.a.c.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:625)
at
o.a.c.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:578)
at o.a.c.tcm.Startup.startup(Startup.java:452)
at o.a.c.tcm.Startup.startup(Startup.java:418)
at o.a.c.service.StorageService.joinRing(StorageService.java:970)
at o.a.c.service.StorageService.initServer(StorageService.java:865)
at o.a.c.service.CassandraDaemon.setup(CassandraDaemon.java:396)
at o.a.c.service.CassandraDaemon.activate(CassandraDaemon.java:836)
at o.a.c.service.CassandraDaemon.main(CassandraDaemon.java:990)}}
{panel}
No retries trigger token {*}re{*}-allocation (and thus only retries for the
same tokens occur) from what I can see. I hacked in [retries for commit
initialTransformation|https://github.com/apache/cassandra/commit/93614be524a87a552cfcc0f96328300e8a47fc89]
and this resolves the issue by ensuring a request for non-conflicting tokens
occurs on retry.
[~samt]
> Fix flaky DTest: bootstrap_test_*
> ---------------------------------
>
> Key: CASSANDRA-21185
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21185
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Test/dtest/python
> Reporter: Sam Lightfoot
> Assignee: Sam Lightfoot
> Priority: Normal
> Fix For: 5.1
>
> Attachments: split_brain_logs.txt
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Tests often failing due to no seed no being up whilst non-seeds trying to
> join the ring. Likely fix to start the seed node to ensure CMS initialization
> is complete then allow other nodes in CCM to start in parallel.
> Affects 5.1+ due to <=5.0 using [sequential
> startup|https://github.com/apache/cassandra-dtest/blob/trunk/bootstrap_test.py#L254C30-L254C32].
> On further analysis it appears two separate clusters form due to the seed
> node not accepting messages during CMS initialization. Attached logs show the
> independent clusters resulting from
> _bootstrap_test.py::TestBootstrap::test_read_from_bootstrapped_node._
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]