I met similar issue before.  What I did was:  reduce Heap size for
rebuild,  reduce  streamthroughput.
But it depends on version, and your env., may not your case,  just hope it
helpful.

ps -ef | grep ,  you will see a new java process for rebuild, see what
memory size used, if use default, it may be use too much, just export
MAX_HEAP_SIZE before nodetool rebuild, it will limit heap size.

streamthroughput=600MB/s, if you look by nodetool, or OS level file, or the
log,  you will it pull files from all nodes --- that is 5 in your case.  so
it will be 3 GB/s ,  on-premise side may not handle it due to
firewall setting.

Regards,
Jim

On Tue, Oct 5, 2021 at 8:43 AM MyWorld <timeplus.1...@gmail.com> wrote:

> Logged "nodetool failuredetector" every 5sec. Doesn't seems to be an issue
> for  phi_convict_threshold value
>
> On Tue, Oct 5, 2021 at 4:35 PM Surbhi Gupta <surbhi.gupt...@gmail.com>
> wrote:
>
>> Hi ,
>>
>> Try to adjust phi_convict_threshold and see if that helps.
>> When we did migration from on prim to AWS, this was one of the factor to
>> consider.
>>
>> Thanks
>>
>>
>> On Tue, Oct 5, 2021 at 4:00 AM MyWorld <timeplus.1...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Need urgent help.
>>> We have one Physical Data Center of 5 nodes with 1 TB data on each
>>> (Location: Dallas). Currently we are using Cassandra ver 3.0.9. Now we are
>>> Adding one more Data Center of 5 nodes(Location GCP-US) and have joined it
>>> to the existing one.
>>>
>>> While running nodetool rebuild command, we are getting following error :
>>> On GCP node (where we ran rebuild command) :
>>>
>>>> ERROR [STREAM-IN-/192.x.x.x] 2021-10-05 15:56:52,246
>>>> StreamSession.java:639 - [Stream #66646d30-25a2-11ec-903b-774f88efe725]
>>>> Remote peer 192.x.x.x failed stream session.
>>>> INFO  [STREAM-IN-/192.x.x.x] 2021-10-05 15:56:52,266
>>>> StreamResultFuture.java:183 - [Stream
>>>> #66646d30-25a2-11ec-903b-774f88efe725] Session with /192.x.x.x is complete
>>>
>>>
>>> On DL source node :
>>>
>>>> INFO  [STREAM-IN-/34.x.x.x] 2021-10-05 15:55:53,785
>>>> StreamResultFuture.java:183 - [Stream
>>>> #66646d30-25a2-11ec-903b-774f88efe725] Session with /34.x.x.x is complete
>>>> ERROR [STREAM-OUT-/34.x.x.x] 2021-10-05 15:55:53,785
>>>> StreamSession.java:534 - [Stream #66646d30-25a2-11ec-903b-774f88efe725]
>>>> Streaming error occurred
>>>> java.lang.RuntimeException: Transfer of file
>>>> /var/lib/cassandra/data/clickstream/glusr_usr_paid_url_mv-3c49c392b35511e9bd0a8f42dfb09617/mc-45676-big-Data.db
>>>> already completed or aborted (perhaps session failed?).
>>>>         at
>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage.startTransfer(OutgoingFileMessage.java:120)
>>>> ~[apache-cassandra-3.0.9.jar:3.0.9]
>>>>         at
>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:50)
>>>> ~[apache-cassandra-3.0.9.jar:3.0.9]
>>>>         at
>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
>>>> ~[apache-cassandra-3.0.9.jar:3.0.9]
>>>>         at
>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:48)
>>>> ~[apache-cassandra-3.0.9.jar:3.0.9]
>>>>         at
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:387)
>>>> ~[apache-cassandra-3.0.9.jar:3.0.9]
>>>>         at
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:367)
>>>> ~[apache-cassandra-3.0.9.jar:3.0.9]
>>>>         at java.lang.Thread.run(Thread.java:748) [na:1.8.0_192]
>>>> WARN  [STREAM-IN-/34.x.x.x] 2021-10-05 15:55:53,786
>>>> StreamResultFuture.java:210 - [Stream
>>>> #66646d30-25a2-11ec-903b-774f88efe725] Stream failed
>>>
>>>
>>> Before starting this rebuild, we have made the following changes:
>>> 1. Set setstreamthroughput to 600 Mb/sec
>>> 2. Set setinterdcstreamthroughput to 600 Mb/sec
>>> 3. streaming_socket_timeout_in_ms is 24 hrs
>>> 4. Disabled autocompaction on GCP node as this was heavily utilising CPU
>>> resource
>>>
>>> FYI, GCP rebuild process starts with data streaming from 3 nodes, and
>>> all fails one by one after streaming for a few hours.
>>> Please help out how to correct this issue.
>>> Is there any other way to rebuild such big data.
>>> We have a few tables with 200 - 400GB of data and some smaller tables.
>>> Also, we have Mviews in our environment
>>>
>>> Regards,
>>> Ashish Gupta
>>>
>>>
>>

Reply via email to