thank you On Thu, Aug 30, 2018 at 11:58 AM Jeff Jirsa <[email protected]> wrote:
> This is the closest JIRA that comes to mind (from memory, I didn't search, > there may be others): https://issues.apache.org/jira/browse/CASSANDRA-8150 > > The best blog that's all in one place on tuning GC in cassandra is > actually Amy's 2.1 tuning guide: > https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html - it's > somewhat out of date as it's for 2.1, but since that's what you're running, > that works out in your favor. > > > > > > On Thu, Aug 30, 2018 at 10:53 AM Jai Bheemsen Rao Dhanwada < > [email protected]> wrote: > >> Hi Jeff, >> >> Is there any JIRA that talks about increasing the HEAP will help? >> Also, any other alternatives than increasing the HEAP Size? last time >> when I tried increasing the heap, longer GC Pauses caused more damage in >> terms of latencies while gc pause. >> >> On Wed, Aug 29, 2018 at 11:07 PM Jai Bheemsen Rao Dhanwada < >> [email protected]> wrote: >> >>> okay, thank you >>> >>> On Wed, Aug 29, 2018 at 11:04 PM Jeff Jirsa <[email protected]> wrote: >>> >>>> You’re seeing an OOM, not a socket error / timeout. >>>> >>>> -- >>>> Jeff Jirsa >>>> >>>> >>>> On Aug 29, 2018, at 10:56 PM, Jai Bheemsen Rao Dhanwada < >>>> [email protected]> wrote: >>>> >>>> Jeff, >>>> >>>> any idea if this is somehow related to : >>>> https://issues.apache.org/jira/browse/CASSANDRA-11840? >>>> does increasing the value of streaming_socket_timeout_in_ms to a higher >>>> value helps? >>>> >>>> On Wed, Aug 29, 2018 at 10:52 PM Jai Bheemsen Rao Dhanwada < >>>> [email protected]> wrote: >>>> >>>>> I have 72 nodes in the cluster, across 8 datacenters.. the moment I >>>>> try to increase the node above 84 or so, the issue starts. >>>>> >>>>> I am still using CMS Heap, assuming it will create more harm if I >>>>> increase the heap size beyond 8G(recommended). >>>>> >>>>> On Wed, Aug 29, 2018 at 6:53 PM Jeff Jirsa <[email protected]> wrote: >>>>> >>>>>> Given the size of your schema, you’re probably getting flooded with a >>>>>> bunch of huge schema mutations as it hops into gossip and tries to pull >>>>>> the >>>>>> schema from every host it sees. You say 8 DCs but you don’t say how many >>>>>> nodes - I’m guessing it’s a lot? >>>>>> >>>>>> This is something that’s incrementally better in 3.0, but a real >>>>>> proper fix has been talked about a few times - >>>>>> https://issues.apache.org/jira/browse/CASSANDRA-11748 and >>>>>> https://issues.apache.org/jira/browse/CASSANDRA-13569 for example >>>>>> >>>>>> In the short term, you may be able to work around this by increasing >>>>>> your heap size. If that doesn’t work, there’s an ugly ugly hack that’ll >>>>>> work on 2.1: limiting the number of schema blobs you can get at a time - >>>>>> in this case, that means firewall off all but a few nodes in your cluster >>>>>> for 10-30 seconds, make sure it gets the schema (watch the logs or file >>>>>> system for the tables to be created), then remove the firewall so it can >>>>>> start the bootstrap process (it needs the schema to setup the streaming >>>>>> plan, and it needs all the hosts up in gossip to stream successfully, so >>>>>> this is an ugly hack to give you time to get the schema and then heal the >>>>>> cluster so it can bootstrap). >>>>>> >>>>>> Yea that’s awful. Hopefully either of the two above JIRAs lands to >>>>>> make this less awful. >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jeff Jirsa >>>>>> >>>>>> >>>>>> On Aug 29, 2018, at 6:29 PM, Jai Bheemsen Rao Dhanwada < >>>>>> [email protected]> wrote: >>>>>> >>>>>> It fails before bootstrap >>>>>> >>>>>> streaming throughpu on the nodes is set to 400Mb/ps >>>>>> >>>>>> On Wednesday, August 29, 2018, Jeff Jirsa <[email protected]> wrote: >>>>>> >>>>>>> Is the bootstrap plan succeeding (does streaming start or does it >>>>>>> crash before it logs messages about streaming starting)? >>>>>>> >>>>>>> Have you capped the stream throughput on the existing hosts? >>>>>>> >>>>>>> -- >>>>>>> Jeff Jirsa >>>>>>> >>>>>>> >>>>>>> On Aug 29, 2018, at 5:02 PM, Jai Bheemsen Rao Dhanwada < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>> Hello All, >>>>>>> >>>>>>> We are seeing some issue when we add more nodes to the cluster, >>>>>>> where new node bootstrap is not able to stream the entire metadata and >>>>>>> fails to bootstrap. Finally the process dies with OOM >>>>>>> (java.lang.OutOfMemoryError: >>>>>>> Java heap space) >>>>>>> >>>>>>> But if I remove few nodes from the cluster we don't see this issue. >>>>>>> >>>>>>> Cassandra Version: 2.1.16 >>>>>>> # of KS and CF : 100, 3000 (approx) >>>>>>> # of DC: 8 >>>>>>> # of Vnodes per node: 256 >>>>>>> >>>>>>> Not sure what is causing this behavior, has any one come across this >>>>>>> scenario? >>>>>>> thanks in advance. >>>>>>> >>>>>>>
