[ 
https://issues.apache.org/jira/browse/CASSANDRA-21185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061946#comment-18061946
 ] 

Sam Lightfoot commented on CASSANDRA-21185:
-------------------------------------------

I ran the change to unconditionally add the seeds nodes through CI and the 
bootstrap tests still fail but at a later stage due to duplicate token 
allocation issues:
{panel:title=Token Allocation Race (local run, node C)}
{{ERROR [main] 2026-02-28T12:03:40,195 CassandraDaemon.java:1017 - Exception 
encountered during startup
java.lang.IllegalStateException: Can not commit transformation: "INVALID"
  (Rejecting this plan as some tokens are already assigned:
    [-2659780865256175721 (node 2|/127.0.0.2:7000),
     -4436269193111982995 (node 2|/127.0.0.2:7000),
     -7126313408081543240 (node 2|/127.0.0.2:7000),
     -4009982691830853605 (node 2|/127.0.0.2:7000),
     1777711142818670636 (node 2|/127.0.0.2:7000),
     8757199486592871928 (node 2|/127.0.0.2:7000),
     -8337993754753524387 (node 2|/127.0.0.2:7000),
     -1602404348327217504 (node 2|/127.0.0.2:7000),
     -8753631414325164267 (node 2|/127.0.0.2:7000),
     519290551675017603 (node 2|/127.0.0.2:7000),
     6539120763084666461 (node 2|/127.0.0.2:7000),
     -6728990555732830449 (node 2|/127.0.0.2:7000),
     -5470569839787387361 (node 2|/127.0.0.2:7000),
     7708075686263680840 (node 2|/127.0.0.2:7000),
     2842052132591734033 (node 2|/127.0.0.2:7000),
     4662056433571680432 (node 2|/127.0.0.2:7000)])
        at 
o.a.c.tcm.ClusterMetadataService.lambda$commit$6(ClusterMetadataService.java:581)
        at 
o.a.c.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:625)
        at 
o.a.c.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:578)
        at o.a.c.tcm.Startup.startup(Startup.java:452)
        at o.a.c.tcm.Startup.startup(Startup.java:418)
        at o.a.c.service.StorageService.joinRing(StorageService.java:970)
        at o.a.c.service.StorageService.initServer(StorageService.java:865)
        at o.a.c.service.CassandraDaemon.setup(CassandraDaemon.java:396)
        at o.a.c.service.CassandraDaemon.activate(CassandraDaemon.java:836)
        at o.a.c.service.CassandraDaemon.main(CassandraDaemon.java:990)}}
{panel}
No retries trigger token {*}re{*}-allocation (and thus only retries for the 
same tokens occur) from what I can see. I hacked in [retries for commit 
initialTransformation|https://github.com/apache/cassandra/commit/93614be524a87a552cfcc0f96328300e8a47fc89]
 and this resolves the issue by ensuring a request for non-conflicting tokens 
occurs on retry.

[~samt] 

 

> Fix flaky DTest: bootstrap_test_*
> ---------------------------------
>
>                 Key: CASSANDRA-21185
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21185
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/python
>            Reporter: Sam Lightfoot
>            Assignee: Sam Lightfoot
>            Priority: Normal
>             Fix For: 5.1
>
>         Attachments: split_brain_logs.txt
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Tests often failing due to no seed no being up whilst non-seeds trying to 
> join the ring. Likely fix to start the seed node to ensure CMS initialization 
> is complete then allow other nodes in CCM to start in parallel.
> Affects 5.1+ due to <=5.0 using [sequential 
> startup|https://github.com/apache/cassandra-dtest/blob/trunk/bootstrap_test.py#L254C30-L254C32].
> On further analysis it appears two separate clusters form due to the seed 
> node not accepting messages during CMS initialization. Attached logs show the 
> independent clusters resulting from 
> _bootstrap_test.py::TestBootstrap::test_read_from_bootstrapped_node._



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to