Hi Jeff,

Is there any JIRA that talks about increasing the HEAP will help?
Also, any other alternatives than increasing the HEAP Size? last time when
I tried increasing the heap, longer GC Pauses caused more damage in terms
of latencies while gc pause.

On Wed, Aug 29, 2018 at 11:07 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> okay, thank you
>
> On Wed, Aug 29, 2018 at 11:04 PM Jeff Jirsa <jji...@gmail.com> wrote:
>
>> You’re seeing an OOM, not a socket error / timeout.
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Aug 29, 2018, at 10:56 PM, Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>> Jeff,
>>
>> any idea if this is somehow related to :
>> https://issues.apache.org/jira/browse/CASSANDRA-11840?
>> does increasing the value of streaming_socket_timeout_in_ms to a higher
>> value helps?
>>
>> On Wed, Aug 29, 2018 at 10:52 PM Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> I have 72 nodes in the cluster, across 8 datacenters.. the moment I try
>>> to increase the node above 84 or so, the issue starts.
>>>
>>> I am still using CMS Heap, assuming it will create more harm if I
>>> increase the heap size beyond 8G(recommended).
>>>
>>> On Wed, Aug 29, 2018 at 6:53 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>>
>>>> Given the size of your schema, you’re probably getting flooded with a
>>>> bunch of huge schema mutations as it hops into gossip and tries to pull the
>>>> schema from every host it sees. You say 8 DCs but you don’t say how many
>>>> nodes - I’m guessing it’s  a lot?
>>>>
>>>> This is something that’s incrementally better in 3.0, but a real proper
>>>> fix has been talked about a few times  -
>>>> https://issues.apache.org/jira/browse/CASSANDRA-11748 and
>>>> https://issues.apache.org/jira/browse/CASSANDRA-13569 for example
>>>>
>>>> In the short term, you may be able to work around this by increasing
>>>> your heap size. If that doesn’t work, there’s an ugly ugly hack that’ll
>>>> work on 2.1:  limiting the number of schema blobs you can get at a time -
>>>> in this case, that means firewall off all but a few nodes in your cluster
>>>> for 10-30 seconds, make sure it gets the schema (watch the logs or file
>>>> system for the tables to be created), then remove the firewall so it can
>>>> start the bootstrap process (it needs the schema to setup the streaming
>>>> plan, and it needs all the hosts up in gossip to stream successfully, so
>>>> this is an ugly hack to give you time to get the schema and then heal the
>>>> cluster so it can bootstrap).
>>>>
>>>> Yea that’s awful. Hopefully either of the two above JIRAs lands to make
>>>> this less awful.
>>>>
>>>>
>>>>
>>>> --
>>>> Jeff Jirsa
>>>>
>>>>
>>>> On Aug 29, 2018, at 6:29 PM, Jai Bheemsen Rao Dhanwada <
>>>> jaibheem...@gmail.com> wrote:
>>>>
>>>> It fails before bootstrap
>>>>
>>>> streaming throughpu on the nodes is set to 400Mb/ps
>>>>
>>>> On Wednesday, August 29, 2018, Jeff Jirsa <jji...@gmail.com> wrote:
>>>>
>>>>> Is the bootstrap plan succeeding (does streaming start or does it
>>>>> crash before it logs messages about streaming starting)?
>>>>>
>>>>> Have you capped the stream throughput on the existing hosts?
>>>>>
>>>>> --
>>>>> Jeff Jirsa
>>>>>
>>>>>
>>>>> On Aug 29, 2018, at 5:02 PM, Jai Bheemsen Rao Dhanwada <
>>>>> jaibheem...@gmail.com> wrote:
>>>>>
>>>>> Hello All,
>>>>>
>>>>> We are seeing some issue when we add more nodes to the cluster, where
>>>>> new node bootstrap is not able to stream the entire metadata and fails to
>>>>> bootstrap. Finally the process dies with OOM (java.lang.OutOfMemoryError:
>>>>> Java heap space)
>>>>>
>>>>> But if I remove few nodes from the cluster we don't see this issue.
>>>>>
>>>>> Cassandra Version: 2.1.16
>>>>> # of KS and CF : 100, 3000 (approx)
>>>>> # of DC: 8
>>>>> # of Vnodes per node: 256
>>>>>
>>>>> Not sure what is causing this behavior, has any one come across this
>>>>> scenario?
>>>>> thanks in advance.
>>>>>
>>>>>

Reply via email to