[ https://issues.apache.org/jira/browse/CASSANDRA-19879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877770#comment-17877770 ]
Dmitry Konstantinov edited comment on CASSANDRA-19879 at 8/29/24 5:22 PM: -------------------------------------------------------------------------- Possible ways to make the logic more stable: * use spinAssertEquals for ApplicationState within PullSchemaFrom * change the test to make it similar to trunk version. The trunk version was changed here: https://github.com/krummas/cassandra/commit/c95cac2a0bf272aacea81022a5256c5eb454c879 / https://github.com/krummas/cassandra/compare/02be7aa71c...6491a70041 (CASSANDRA-18791) was (Author: dnk): Possible ways to make the logic more stable: * use spinAssertEquals for ApplicationState within PullSchemaFrom * change the test to make it similar to trunk version > distributed.test.ring.BootstrapTest#bootstrapUnspecifiedResumeTest fails > sometimes > ---------------------------------------------------------------------------------- > > Key: CASSANDRA-19879 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19879 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission > Reporter: Dmitry Konstantinov > Priority: Low > Fix For: 5.0.x > > > org.apache.cassandra.distributed.test.ring.BootstrapTest#bootstrapUnspecifiedResumeTest > JUnit test may fail rarely with NPE: > {code:java} > java.lang.NullPointerException: Cannot invoke > "org.apache.cassandra.gms.EndpointState.getApplicationState(org.apache.cassandra.gms.ApplicationState)" > because "state" is null > at > org.apache.cassandra.distributed.action.GossipHelper$PullSchemaFrom.lambda$accept$6adea493$1(GossipHelper.java:245) > at > org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$async$10(IsolatedExecutor.java:156) > at > org.apache.cassandra.concurrent.FutureTask$2.call(FutureTask.java:124) > at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) > at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:840){code} > Observed during testing of CASSANDRA-19651 > It is not reproduced easily. > As a part of Instance.startup org.apache.cassandra.gms.Gossiper#waitToSettle > waits for 5 +3 x 1 = 8 seconds if there are no changes in the number of nodes > discovered using gossip (even if we have not had any interactions with other > nodes using gossip at all). > I have added a 5-second sleep to > org.apache.cassandra.gms.Gossiper.GossipTask#run (we also have 1 second of > initial delay when we schedule GossipTask) > {code} > private class GossipTask implements Runnable > { > public void run() > { > try > { > //wait on messaging service to start listening > MessagingService.instance().waitUntilListening(); > Thread.sleep(5000); // <=============================== > taskLock.lock(); > {code} > and have got the NPE reproduced more frequently. > So, it looks like the test may fail if by some reason GossipTask haven't had > a chance to run before EndpointState.getApplicationState is invoked as a part > of the test logic. > Note: In 5.1 the test is different and does not have pullSchemaFrom logic at > all. > A conversion about the issue was started in CASSANDRA-19651 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org