I met similar issue before. What I did was: reduce Heap size for rebuild, reduce streamthroughput. But it depends on version, and your env., may not your case, just hope it helpful.
ps -ef | grep , you will see a new java process for rebuild, see what memory size used, if use default, it may be use too much, just export MAX_HEAP_SIZE before nodetool rebuild, it will limit heap size. streamthroughput=600MB/s, if you look by nodetool, or OS level file, or the log, you will it pull files from all nodes --- that is 5 in your case. so it will be 3 GB/s , on-premise side may not handle it due to firewall setting. Regards, Jim On Tue, Oct 5, 2021 at 8:43 AM MyWorld <timeplus.1...@gmail.com> wrote: > Logged "nodetool failuredetector" every 5sec. Doesn't seems to be an issue > for phi_convict_threshold value > > On Tue, Oct 5, 2021 at 4:35 PM Surbhi Gupta <surbhi.gupt...@gmail.com> > wrote: > >> Hi , >> >> Try to adjust phi_convict_threshold and see if that helps. >> When we did migration from on prim to AWS, this was one of the factor to >> consider. >> >> Thanks >> >> >> On Tue, Oct 5, 2021 at 4:00 AM MyWorld <timeplus.1...@gmail.com> wrote: >> >>> Hi all, >>> >>> Need urgent help. >>> We have one Physical Data Center of 5 nodes with 1 TB data on each >>> (Location: Dallas). Currently we are using Cassandra ver 3.0.9. Now we are >>> Adding one more Data Center of 5 nodes(Location GCP-US) and have joined it >>> to the existing one. >>> >>> While running nodetool rebuild command, we are getting following error : >>> On GCP node (where we ran rebuild command) : >>> >>>> ERROR [STREAM-IN-/192.x.x.x] 2021-10-05 15:56:52,246 >>>> StreamSession.java:639 - [Stream #66646d30-25a2-11ec-903b-774f88efe725] >>>> Remote peer 192.x.x.x failed stream session. >>>> INFO [STREAM-IN-/192.x.x.x] 2021-10-05 15:56:52,266 >>>> StreamResultFuture.java:183 - [Stream >>>> #66646d30-25a2-11ec-903b-774f88efe725] Session with /192.x.x.x is complete >>> >>> >>> On DL source node : >>> >>>> INFO [STREAM-IN-/34.x.x.x] 2021-10-05 15:55:53,785 >>>> StreamResultFuture.java:183 - [Stream >>>> #66646d30-25a2-11ec-903b-774f88efe725] Session with /34.x.x.x is complete >>>> ERROR [STREAM-OUT-/34.x.x.x] 2021-10-05 15:55:53,785 >>>> StreamSession.java:534 - [Stream #66646d30-25a2-11ec-903b-774f88efe725] >>>> Streaming error occurred >>>> java.lang.RuntimeException: Transfer of file >>>> /var/lib/cassandra/data/clickstream/glusr_usr_paid_url_mv-3c49c392b35511e9bd0a8f42dfb09617/mc-45676-big-Data.db >>>> already completed or aborted (perhaps session failed?). >>>> at >>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage.startTransfer(OutgoingFileMessage.java:120) >>>> ~[apache-cassandra-3.0.9.jar:3.0.9] >>>> at >>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:50) >>>> ~[apache-cassandra-3.0.9.jar:3.0.9] >>>> at >>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42) >>>> ~[apache-cassandra-3.0.9.jar:3.0.9] >>>> at >>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:48) >>>> ~[apache-cassandra-3.0.9.jar:3.0.9] >>>> at >>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:387) >>>> ~[apache-cassandra-3.0.9.jar:3.0.9] >>>> at >>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:367) >>>> ~[apache-cassandra-3.0.9.jar:3.0.9] >>>> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_192] >>>> WARN [STREAM-IN-/34.x.x.x] 2021-10-05 15:55:53,786 >>>> StreamResultFuture.java:210 - [Stream >>>> #66646d30-25a2-11ec-903b-774f88efe725] Stream failed >>> >>> >>> Before starting this rebuild, we have made the following changes: >>> 1. Set setstreamthroughput to 600 Mb/sec >>> 2. Set setinterdcstreamthroughput to 600 Mb/sec >>> 3. streaming_socket_timeout_in_ms is 24 hrs >>> 4. Disabled autocompaction on GCP node as this was heavily utilising CPU >>> resource >>> >>> FYI, GCP rebuild process starts with data streaming from 3 nodes, and >>> all fails one by one after streaming for a few hours. >>> Please help out how to correct this issue. >>> Is there any other way to rebuild such big data. >>> We have a few tables with 200 - 400GB of data and some smaller tables. >>> Also, we have Mviews in our environment >>> >>> Regards, >>> Ashish Gupta >>> >>> >>