Re: Bootstrap streaming issues

Jai Bheemsen Rao Dhanwada Thu, 30 Aug 2018 12:52:05 -0700

thank you

On Thu, Aug 30, 2018 at 11:58 AM Jeff Jirsa <[email protected]> wrote:


> This is the closest JIRA that comes to mind (from memory, I didn't search,
> there may be others): https://issues.apache.org/jira/browse/CASSANDRA-8150
>
> The best blog that's all in one place on tuning GC in cassandra is
> actually Amy's 2.1 tuning guide:
> https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html - it's
> somewhat out of date as it's for 2.1, but since that's what you're running,
> that works out in your favor.
>
>
>
>
>
> On Thu, Aug 30, 2018 at 10:53 AM Jai Bheemsen Rao Dhanwada <
> [email protected]> wrote:
>
>> Hi Jeff,
>>
>> Is there any JIRA that talks about increasing the HEAP will help?
>> Also, any other alternatives than increasing the HEAP Size? last time
>> when I tried increasing the heap, longer GC Pauses caused more damage in
>> terms of latencies while gc pause.
>>
>> On Wed, Aug 29, 2018 at 11:07 PM Jai Bheemsen Rao Dhanwada <
>> [email protected]> wrote:
>>
>>> okay, thank you
>>>
>>> On Wed, Aug 29, 2018 at 11:04 PM Jeff Jirsa <[email protected]> wrote:
>>>
>>>> You’re seeing an OOM, not a socket error / timeout.
>>>>
>>>> --
>>>> Jeff Jirsa
>>>>
>>>>
>>>> On Aug 29, 2018, at 10:56 PM, Jai Bheemsen Rao Dhanwada <
>>>> [email protected]> wrote:
>>>>
>>>> Jeff,
>>>>
>>>> any idea if this is somehow related to :
>>>> https://issues.apache.org/jira/browse/CASSANDRA-11840?
>>>> does increasing the value of streaming_socket_timeout_in_ms to a higher
>>>> value helps?
>>>>
>>>> On Wed, Aug 29, 2018 at 10:52 PM Jai Bheemsen Rao Dhanwada <
>>>> [email protected]> wrote:
>>>>
>>>>> I have 72 nodes in the cluster, across 8 datacenters.. the moment I
>>>>> try to increase the node above 84 or so, the issue starts.
>>>>>
>>>>> I am still using CMS Heap, assuming it will create more harm if I
>>>>> increase the heap size beyond 8G(recommended).
>>>>>
>>>>> On Wed, Aug 29, 2018 at 6:53 PM Jeff Jirsa <[email protected]> wrote:
>>>>>
>>>>>> Given the size of your schema, you’re probably getting flooded with a
>>>>>> bunch of huge schema mutations as it hops into gossip and tries to pull 
>>>>>> the
>>>>>> schema from every host it sees. You say 8 DCs but you don’t say how many
>>>>>> nodes - I’m guessing it’s  a lot?
>>>>>>
>>>>>> This is something that’s incrementally better in 3.0, but a real
>>>>>> proper fix has been talked about a few times  -
>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-11748 and
>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-13569 for example
>>>>>>
>>>>>> In the short term, you may be able to work around this by increasing
>>>>>> your heap size. If that doesn’t work, there’s an ugly ugly hack that’ll
>>>>>> work on 2.1:  limiting the number of schema blobs you can get at a time -
>>>>>> in this case, that means firewall off all but a few nodes in your cluster
>>>>>> for 10-30 seconds, make sure it gets the schema (watch the logs or file
>>>>>> system for the tables to be created), then remove the firewall so it can
>>>>>> start the bootstrap process (it needs the schema to setup the streaming
>>>>>> plan, and it needs all the hosts up in gossip to stream successfully, so
>>>>>> this is an ugly hack to give you time to get the schema and then heal the
>>>>>> cluster so it can bootstrap).
>>>>>>
>>>>>> Yea that’s awful. Hopefully either of the two above JIRAs lands to
>>>>>> make this less awful.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jeff Jirsa
>>>>>>
>>>>>>
>>>>>> On Aug 29, 2018, at 6:29 PM, Jai Bheemsen Rao Dhanwada <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> It fails before bootstrap
>>>>>>
>>>>>> streaming throughpu on the nodes is set to 400Mb/ps
>>>>>>
>>>>>> On Wednesday, August 29, 2018, Jeff Jirsa <[email protected]> wrote:
>>>>>>
>>>>>>> Is the bootstrap plan succeeding (does streaming start or does it
>>>>>>> crash before it logs messages about streaming starting)?
>>>>>>>
>>>>>>> Have you capped the stream throughput on the existing hosts?
>>>>>>>
>>>>>>> --
>>>>>>> Jeff Jirsa
>>>>>>>
>>>>>>>
>>>>>>> On Aug 29, 2018, at 5:02 PM, Jai Bheemsen Rao Dhanwada <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> We are seeing some issue when we add more nodes to the cluster,
>>>>>>> where new node bootstrap is not able to stream the entire metadata and
>>>>>>> fails to bootstrap. Finally the process dies with OOM 
>>>>>>> (java.lang.OutOfMemoryError:
>>>>>>> Java heap space)
>>>>>>>
>>>>>>> But if I remove few nodes from the cluster we don't see this issue.
>>>>>>>
>>>>>>> Cassandra Version: 2.1.16
>>>>>>> # of KS and CF : 100, 3000 (approx)
>>>>>>> # of DC: 8
>>>>>>> # of Vnodes per node: 256
>>>>>>>
>>>>>>> Not sure what is causing this behavior, has any one come across this
>>>>>>> scenario?
>>>>>>> thanks in advance.
>>>>>>>
>>>>>>>

Re: Bootstrap streaming issues

Reply via email to