You may need to look at the zipped log files if the streaming had been
running for a while before failing. The error could have happened hours
or days before the final failure.
If your cluster is already experiencing performance issues (e.g. due to
CPU bottleneck or GC pauses), it's highly likely that these are related
to the streaming failure.
On 15/08/2024 13:41, Joe Obernberger wrote:
Thank you Bowen - yeah the only ERROR I see in
/var/log/cassandra/debug.log is:
ERROR [main] 2024-08-15 04:48:23,374 StorageService.java:2041 - Error
while waiting on bootstrap to complete. Bootstrap will have to be
restarted.
java.util.concurrent.ExecutionException:
org.apache.cassandra.streaming.StreamException: Stream failed
at
org.apache.cassandra.utils.concurrent.AbstractFuture.getWhenDone(AbstractFuture.java:239)
at
org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:246)
at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:2034)
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1185)
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1145)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:936)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:854)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:421)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:744)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:878)
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:243)
at
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:205)
at
org.apache.cassandra.streaming.StreamSession.lambda$closeSession$2(StreamSession.java:517)
at
org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)
at
org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
at
org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)
The cluster is running inside of kubernetes on bare metal with a
netapp for storage. I'd love a way to double the number of nodes, but
sounds like I shouldn't have let it get this far. We're having some
odd performance issues on reads, that I'm diagnosing.
-Joe
On 8/14/2024 5:07 PM, Bowen Song via user wrote:
It looks like all your nodes are in the same DC and the same rack
with 256 vnodes each. It's very hard (if not impossible) to add
multiple nodes to the same DC concurrently and safely in this setup.
You are better off adding one node at a time to this cluster.
Try search for "ERROR" in the logs, it should tell you why did the
streaming session fail. If you can find the cause of the failure, you
may be able to prevent or reduce the chance of it happening again in
the future.
On 14/08/2024 21:50, Joe Obernberger wrote:
Hi all - when adding a node to our existing 15 node cluster, I get:
DEBUG [NonPeriodicTasks:1] 2024-08-14 20:34:10,383
StreamCoordinator.java:152 - Finished connecting all sessions
WARN [NonPeriodicTasks:1] 2024-08-14 20:34:10,385
StreamResultFuture.java:242 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Stream failed
DEBUG [NonPeriodicTasks:1] 2024-08-14 20:34:10,386
StreamSession.java:529 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Will close attached inbound
{3d52d53f=org.apache.cassandra.streaming.async.NettyStreamingChannel@2e6b7635,
67655215=org.apache.cassandra.streaming.async.NettyStreamingChannel@11e1a7e5,
d4d397f3=org.apache.cassandra.streaming.async.NettyStreamingChannel@11959fa4,
50e7cefb=org.apache.cassandra.streaming.async.NettyStreamingChannel@7a69edc0,
dfbbe8cd=org.apache.cassandra.streaming.async.NettyStreamingChannel@f46e666,
55d116ad=org.apache.cassandra.streaming.async.NettyStreamingChannel@1304a6e4,
5cf05913=org.apache.cassandra.streaming.async.NettyStreamingChannel@2ed1739f,
833323b4=org.apache.cassandra.streaming.async.NettyStreamingChannel@1be38c68,
1146e1c5=org.apache.cassandra.streaming.async.NettyStreamingChannel@232e3ca4,
f983d95c=org.apache.cassandra.streaming.async.NettyStreamingChannel@152b8da3,
3f061317=org.apache.cassandra.streaming.async.NettyStreamingChannel@30434912,
0c0b1395=org.apache.cassandra.streaming.async.NettyStreamingChannel@374403be,
30dafd8c=org.apache.cassandra.streaming.async.NettyStreamingChannel@65ddc2ee,
28c0fe3c=org.apache.cassandra.streaming.async.NettyStreamingChannel@2cd20d63,
55d1221c=org.apache.cassandra.streaming.async.NettyStreamingChannel@6f3ca32,
4fca9827=org.apache.cassandra.streaming.async.NettyStreamingChannel@113d70a1,
bea79e0c=org.apache.cassandra.streaming.async.NettyStreamingChannel@1afc0ada}
and outbound
{0c0b1395=org.apache.cassandra.streaming.async.NettyStreamingChannel@374403be}
channels
DEBUG [Stream-Deserializer-/192.168.189.127:7000-833323b4]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-50e7cefb]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-dfbbe8cd]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-67655215]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-55d116ad]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-1146e1c5]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-f983d95c]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-5cf05913]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-3f061317]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-0c0b1395]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-d4d397f3]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-28c0fe3c]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-3d52d53f]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-55d1221c]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-30dafd8c]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-4fca9827]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
DEBUG [Stream-Deserializer-/192.168.189.127:7000-bea79e0c]
2024-08-14 20:34:10,387 StreamSession.java:677 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Socket closed after session
completed with state COMPLETE
ERROR [main] 2024-08-14 20:34:10,387 StorageService.java:2041 -
Error while waiting on bootstrap to complete. Bootstrap will have to
be restarted.
java.util.concurrent.ExecutionException:
org.apache.cassandra.streaming.StreamException: Stream failed
at
org.apache.cassandra.utils.concurrent.AbstractFuture.getWhenDone(AbstractFuture.java:239)
at
org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:246)
at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:2034)
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1185)
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1145)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:936)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:854)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:421)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:744)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:878)
Caused by: org.apache.cassandra.streaming.StreamException: Stream
failed
at
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:243)
at
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:205)
at
org.apache.cassandra.streaming.StreamSession.lambda$closeSession$2(StreamSession.java:517)
at
org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)
at
org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
at
org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown
Source)
at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)
DEBUG [Messaging-EventLoop-3-15] 2024-08-14 20:34:10,388
StreamingMultiplexedChannel.java:513 - [Stream
#d7bf9f60-5a5e-11ef-aa71-51cb94e3c01f] Closing stream connection
channels on /192.168.189.127:7000
WARN [main] 2024-08-14 20:34:10,456 StorageService.java:1221 - Some
data streaming failed. Use nodetool to check bootstrap state and
resume. For more, see `nodetool help bootstrap`. IN_PROGRESS
INFO [main] 2024-08-14 20:34:10,458 Gossiper.java:2293 - Waiting
for gossip to settle...
DEBUG [main] 2024-08-14 20:34:16,458 Gossiper.java:2305 - Gossip
looks settled.
root@cassandra-15:/# nodetool status -r
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host
ID Rack
UN cassandra-6.cassandra.cassandra-jos.svc.cluster.local 2.07
TiB 256 20.0% 34a419fb-4db1-4fb4-8ea7-c0988b2a3d5a rack1
UN cassandra-3.cassandra.cassandra-jos.svc.cluster.local 2.08
TiB 256 20.0% 354a9e1f-71b4-4aba-9f03-26778aaa17e2 rack1
UN cassandra-14.cassandra.cassandra-jos.svc.cluster.local 1.95
TiB 256 20.0% 6e748875-1101-4bc1-a132-c6fe643c6148 rack1
UN cassandra-5.cassandra.cassandra-jos.svc.cluster.local 2.07
TiB 256 20.0% ead07b83-a927-4904-bc6c-e8e4249f7560 rack1
UN cassandra-7.cassandra.cassandra-jos.svc.cluster.local 2.08
TiB 256 20.0% b5fbc09e-565e-402d-a6d0-3feac5922a13 rack1
UN cassandra-4.cassandra.cassandra-jos.svc.cluster.local 2.27
TiB 256 20.0% 14fa3f70-a8e2-4f60-8bc9-42c7b3294718 rack1
UN cassandra-1.cassandra.cassandra-jos.svc.cluster.local 2.07
TiB 256 20.0% ea5979b8-71fe-4af5-b856-db06e008ced3 rack1
UN cassandra-11.cassandra.cassandra-jos.svc.cluster.local 2.08
TiB 256 20.0% f2bb43f7-8381-4b28-8ced-194060267c29 rack1
UN cassandra-9.cassandra.cassandra-jos.svc.cluster.local 2.07
TiB 256 20.0% ed1ce133-4473-4bc1-a0cc-a242d6f67acc rack1
UN cassandra-2.cassandra.cassandra-jos.svc.cluster.local 2.08
TiB 256 20.0% 2b5db930-2bfe-4829-b353-3b1db0ca368f rack1
UN cassandra-12.cassandra.cassandra-jos.svc.cluster.local 2.07
TiB 256 20.0% 1d3ed71f-18b4-4cc0-9c6e-3a5479328496 rack1
UJ cassandra-15.cassandra.cassandra-jos.svc.cluster.local 874.82
GiB 256 ? 9d68cd5a-6e48-4156-92c1-5c51f02bce65 rack1
UN cassandra-0.cassandra.cassandra-jos.svc.cluster.local 2.14
TiB 256 20.0% 5c3d3b46-300d-49f2-b01d-cbdb44d98022 rack1
UN cassandra-8.cassandra.cassandra-jos.svc.cluster.local 2.07
TiB 256 20.0% fb2e8220-549e-4316-bb78-53f6cd07318a rack1
UN cassandra-10.cassandra.cassandra-jos.svc.cluster.local 2.07
TiB 256 20.0% a2ab4d22-b564-4d6b-b743-2d78598fe53c rack1
UN cassandra-13.cassandra.cassandra-jos.svc.cluster.local 2.07
TiB 256 20.0% ed865baf-5333-40e5-8c73-9de2df2bd330 rack1
If I restart the bootstrap, it usually completes, but I'd like to
double the size of the cluster, and that's a very long operation. Is
there anyway to add multiple nodes at once?
Thanks!
-Joe