Jeff,

any idea if this is somehow related to :
https://issues.apache.org/jira/browse/CASSANDRA-11840?
does increasing the value of streaming_socket_timeout_in_ms to a higher
value helps?

On Wed, Aug 29, 2018 at 10:52 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> I have 72 nodes in the cluster, across 8 datacenters.. the moment I try to
> increase the node above 84 or so, the issue starts.
>
> I am still using CMS Heap, assuming it will create more harm if I increase
> the heap size beyond 8G(recommended).
>
> On Wed, Aug 29, 2018 at 6:53 PM Jeff Jirsa <jji...@gmail.com> wrote:
>
>> Given the size of your schema, you’re probably getting flooded with a
>> bunch of huge schema mutations as it hops into gossip and tries to pull the
>> schema from every host it sees. You say 8 DCs but you don’t say how many
>> nodes - I’m guessing it’s  a lot?
>>
>> This is something that’s incrementally better in 3.0, but a real proper
>> fix has been talked about a few times  -
>> https://issues.apache.org/jira/browse/CASSANDRA-11748 and
>> https://issues.apache.org/jira/browse/CASSANDRA-13569 for example
>>
>> In the short term, you may be able to work around this by increasing your
>> heap size. If that doesn’t work, there’s an ugly ugly hack that’ll work on
>> 2.1:  limiting the number of schema blobs you can get at a time - in this
>> case, that means firewall off all but a few nodes in your cluster for 10-30
>> seconds, make sure it gets the schema (watch the logs or file system for
>> the tables to be created), then remove the firewall so it can start the
>> bootstrap process (it needs the schema to setup the streaming plan, and it
>> needs all the hosts up in gossip to stream successfully, so this is an ugly
>> hack to give you time to get the schema and then heal the cluster so it can
>> bootstrap).
>>
>> Yea that’s awful. Hopefully either of the two above JIRAs lands to make
>> this less awful.
>>
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Aug 29, 2018, at 6:29 PM, Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>> It fails before bootstrap
>>
>> streaming throughpu on the nodes is set to 400Mb/ps
>>
>> On Wednesday, August 29, 2018, Jeff Jirsa <jji...@gmail.com> wrote:
>>
>>> Is the bootstrap plan succeeding (does streaming start or does it crash
>>> before it logs messages about streaming starting)?
>>>
>>> Have you capped the stream throughput on the existing hosts?
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Aug 29, 2018, at 5:02 PM, Jai Bheemsen Rao Dhanwada <
>>> jaibheem...@gmail.com> wrote:
>>>
>>> Hello All,
>>>
>>> We are seeing some issue when we add more nodes to the cluster, where
>>> new node bootstrap is not able to stream the entire metadata and fails to
>>> bootstrap. Finally the process dies with OOM (java.lang.OutOfMemoryError:
>>> Java heap space)
>>>
>>> But if I remove few nodes from the cluster we don't see this issue.
>>>
>>> Cassandra Version: 2.1.16
>>> # of KS and CF : 100, 3000 (approx)
>>> # of DC: 8
>>> # of Vnodes per node: 256
>>>
>>> Not sure what is causing this behavior, has any one come across this
>>> scenario?
>>> thanks in advance.
>>>
>>>

Reply via email to