Caleb Rackliffe created CASSANDRA-18347:
-------------------------------------------
Summary: CEP-21: Startup failures in Python dtests around
TCM_REPLAY_REQ
Key: CASSANDRA-18347
URL: https://issues.apache.org/jira/browse/CASSANDRA-18347
Project: Cassandra
Issue Type: Bug
Reporter: Caleb Rackliffe
There are currently widespread, locally reproducible failures in the Python
dtests against the {{cep-21-tcm}} branch. For example...
{noformat}pytest --cassandra-dir=/Users/maedhroz/Forks/cassandra
topology_test.py::TestTopology::test_decommissioned_node_cant_rejoin{noformat}
{noformat}pytest --cassandra-dir=/Users/maedhroz/Forks/cassandra
materialized_views_test.py::TestMaterializedViews::test_query_new_column{noformat}
{noformat}pytest --cassandra-dir=/Users/maedhroz/Forks/cassandra
read_repair_test.py::TestSpeculativeReadRepair::test_normal_read_repair{noformat}
https://app.circleci.com/pipelines/github/maedhroz/cassandra/701/workflows/44a5c7e0-0de0-4839-bbd0-80771fe23843/jobs/7251
https://app.circleci.com/pipelines/github/beobal/cassandra/406/workflows/00cdb02e-4b3e-477a-b997-403121172249/jobs/4204/tests
The death spiral in the node startup logs starts like this…
{noformat}
WARN [Messaging-EventLoop-3-1] 2023-03-17 11:55:34,037 NoSpamLogger.java:108 -
/127.0.0.2:7000->/127.0.0.1:7000-SMALL_MESSAGES-[no-channel] dropping message
of type TCM_REPLAY_REQ whose timeout expired before reaching the network
ERROR [InternalResponseStage:3] 2023-03-17 11:55:34,038
RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when sending
TCM_REPLAY_REQ, retrying on CandidateIterator{candidates=[/127.0.0.2:7000,
/127.0.0.1:7000], checkLive=false}
INFO [Messaging-EventLoop-3-12] 2023-03-17 11:55:34,099
InboundConnectionInitiator.java:567 -
/127.0.0.2:7000(/127.0.0.2:49763)->/127.0.0.2:7000-SMALL_MESSAGES-1b9301b6
messaging connection established, version = 13, framing = CRC, encryption =
unencrypted
INFO [Messaging-EventLoop-3-9] 2023-03-17 11:55:34,099
OutboundConnection.java:1164 -
/127.0.0.2:7000(/127.0.0.2:49763)->/127.0.0.2:7000-SMALL_MESSAGES-a9302b2e
successfully connected, version = 13, framing = CRC, encryption = unencrypted
WARN [InternalMetadataStage:5] 2023-03-17 11:55:34,100 NoSpamLogger.java:108 -
Not currently a member of the CMS
INFO [Messaging-EventLoop-3-13] 2023-03-17 11:55:34,102
InboundConnectionInitiator.java:567 -
/127.0.0.2:7000(/127.0.0.2:49764)->/127.0.0.2:7000-URGENT_MESSAGES-f887f6fa
messaging connection established, version = 13, framing = CRC, encryption =
unencrypted
INFO [Messaging-EventLoop-3-11] 2023-03-17 11:55:34,102
OutboundConnection.java:1164 -
/127.0.0.2:7000(/127.0.0.2:49764)->/127.0.0.2:7000-URGENT_MESSAGES-5cd0c637
successfully connected, version = 13, framing = CRC, encryption = unencrypted
ERROR [InternalResponseStage:4] 2023-03-17 11:55:49,237
RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when sending
TCM_REPLAY_REQ, retrying on CandidateIterator{candidates=[/127.0.0.2:7000,
/127.0.0.1:7000, /127.0.0.2:7000, /
127.0.0.3:7000, /127.0.0.1:7000], checkLive=false}
WARN [InternalMetadataStage:8] 2023-03-17 11:55:49,394 NoSpamLogger.java:108 -
Not currently a member of the CMS
WARN [Messaging-EventLoop-3-1] 2023-03-17 11:56:04,636 NoSpamLogger.java:108 -
/127.0.0.2:7000->/127.0.0.1:7000-SMALL_MESSAGES-[no-channel] dropping message
of type TCM_REPLAY_REQ whose timeout expired before reaching the network
ERROR [InternalResponseStage:5] 2023-03-17 11:56:04,637
RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when sending
TCM_REPLAY_REQ, retrying on CandidateIterator{candidates=[/127.0.0.2:7000,
/127.0.0.3:7000, /127.0.0.1:7000, /
127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000,
/127.0.0.1:7000], checkLive=false}
WARN [InternalMetadataStage:11] 2023-03-17 11:56:04,892 NoSpamLogger.java:108
- Not currently a member of the CMS
...
ERROR [InternalResponseStage:6] 2023-03-17 11:56:20,335
RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when sending
TCM_REPLAY_REQ, retrying on CandidateIterator{candidates=[/127.0.0.2:7000,
/127.0.0.1:7000], checkLive=false}
WARN [InternalMetadataStage:14] 2023-03-17 11:56:20,391 NoSpamLogger.java:108
- Not currently a member of the CMS
ERROR [InternalResponseStage:7] 2023-03-17 11:56:21,750
RemoteProcessor.java:164 - Got error from /127.0.0.3:7000: TIMEOUT when sending
TCM_REPLAY_REQ, retrying on CandidateIterator{candidates=[/127.0.0.1:7000,
/127.0.0.2:7000, /127.0.0.1:7000, /
127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000,
/127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.3:7000],
checkLive=false}
WARN [Messaging-EventLoop-3-1] 2023-03-17 11:56:35,535 NoSpamLogger.java:108 -
/127.0.0.2:7000->/127.0.0.1:7000-SMALL_MESSAGES-[no-channel] dropping message
of type TCM_REPLAY_REQ whose timeout expired before reaching the network
ERROR [InternalResponseStage:8] 2023-03-17 11:56:35,537
RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when sending
TCM_REPLAY_REQ, retrying on CandidateIterator{candidates=[/127.0.0.2:7000,
/127.0.0.1:7000, /127.0.0.2:7000, /
127.0.0.3:7000, /127.0.0.1:7000], checkLive=false}
WARN [InternalMetadataStage:17] 2023-03-17 11:56:35,693 NoSpamLogger.java:108
- Not currently a member of the CMS
ERROR [InternalResponseStage:9] 2023-03-17 11:56:37,135
RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when sending
TCM_REPLAY_REQ, retrying on CandidateIterator{candidates=[/127.0.0.2:7000,
/127.0.0.1:7000, /127.0.0.2:7000, /
127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000,
/127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.3:7000, /127.0.0.1:7000],
checkLive=false}
WARN [InternalMetadataStage:20] 2023-03-17 11:56:37,540 NoSpamLogger.java:108
- Not currently a member of the CMS
ERROR [InternalResponseStage:10] 2023-03-17 11:56:50,935
RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when sending
TCM_REPLAY_REQ, retrying on CandidateIterator{candidates=[/127.0.0.2:7000,
/127.0.0.3:7000, /127.0.0.1:7000,
/127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000,
/127.0.0.1:7000], checkLive=false}
WARN [InternalMetadataStage:23] 2023-03-17 11:56:51,191 NoSpamLogger.java:108
- Not currently a member of the CMS
{noformat}
...and ends here:
{noformat}
ERROR [InternalResponseStage:11] 2023-03-17 11:56:53,036
RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when sending
TCM_REPLAY_REQ, retrying on CandidateIterator{candidates=[/127.0.0.2:7000,
/127.0.0.3:7000, /127.0.0.1:7000,
/127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000,
/127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000,
/127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.1:7000], checkLive=false}
Exception (java.lang.IllegalStateException) encountered during startup: Could
not succeed sending TCM_REPLAY_REQ to
CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000,
/127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000
, /127.0.0.3:7000, /127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000,
/127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.1:7000],
checkLive=false} after 10 tries
ERROR [main] 2023-03-17 11:56:53,546 CassandraDaemon.java:929 - Exception
encountered during startup
java.lang.IllegalStateException: Could not succeed sending TCM_REPLAY_REQ to
CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000,
/127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000,
/127.0.0.3:7000, /127.0.0.3:7000, /12
7.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000,
/127.0.0.3:7000, /127.0.0.1:7000], checkLive=false} after 10 tries
at
org.apache.cassandra.tcm.RemoteProcessor.sendWithCallback(RemoteProcessor.java:181)
at
org.apache.cassandra.tcm.RemoteProcessor.replayAndWait(RemoteProcessor.java:118)
at
org.apache.cassandra.tcm.ClusterMetadataService$SwitchableProcessor.replayAndWait(ClusterMetadataService.java:577)
at
org.apache.cassandra.tcm.Startup.initializeForDiscovery(Startup.java:149)
at org.apache.cassandra.tcm.Startup.initialize(Startup.java:84)
at org.apache.cassandra.tcm.Startup.initialize(Startup.java:59)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:267)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:777)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:907)
...
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]