CMS is fine at 12G for sure, likely up to 16G You’ll want to initiate CMS a bit earlier (55-69%), and you likely want new gen to be larger - perhaps 3-6G
You’ll want to manually set the memtable size - it scales with heap by default After bootstrap you can lower it again -- Jeff Jirsa > On Aug 29, 2018, at 10:52 PM, Jai Bheemsen Rao Dhanwada > <jaibheem...@gmail.com> wrote: > > I have 72 nodes in the cluster, across 8 datacenters.. the moment I try to > increase the node above 84 or so, the issue starts. > > I am still using CMS Heap, assuming it will create more harm if I increase > the heap size beyond 8G(recommended). > >> On Wed, Aug 29, 2018 at 6:53 PM Jeff Jirsa <jji...@gmail.com> wrote: >> Given the size of your schema, you’re probably getting flooded with a bunch >> of huge schema mutations as it hops into gossip and tries to pull the schema >> from every host it sees. You say 8 DCs but you don’t say how many nodes - >> I’m guessing it’s a lot? >> >> This is something that’s incrementally better in 3.0, but a real proper fix >> has been talked about a few times - >> https://issues.apache.org/jira/browse/CASSANDRA-11748 and >> https://issues.apache.org/jira/browse/CASSANDRA-13569 for example >> >> In the short term, you may be able to work around this by increasing your >> heap size. If that doesn’t work, there’s an ugly ugly hack that’ll work on >> 2.1: limiting the number of schema blobs you can get at a time - in this >> case, that means firewall off all but a few nodes in your cluster for 10-30 >> seconds, make sure it gets the schema (watch the logs or file system for the >> tables to be created), then remove the firewall so it can start the >> bootstrap process (it needs the schema to setup the streaming plan, and it >> needs all the hosts up in gossip to stream successfully, so this is an ugly >> hack to give you time to get the schema and then heal the cluster so it can >> bootstrap). >> >> Yea that’s awful. Hopefully either of the two above JIRAs lands to make this >> less awful. >> >> >> >> -- >> Jeff Jirsa >> >> >>> On Aug 29, 2018, at 6:29 PM, Jai Bheemsen Rao Dhanwada >>> <jaibheem...@gmail.com> wrote: >>> >>> It fails before bootstrap >>> >>> streaming throughpu on the nodes is set to 400Mb/ps >>> >>>> On Wednesday, August 29, 2018, Jeff Jirsa <jji...@gmail.com> wrote: >>>> Is the bootstrap plan succeeding (does streaming start or does it crash >>>> before it logs messages about streaming starting)? >>>> >>>> Have you capped the stream throughput on the existing hosts? >>>> >>>> -- >>>> Jeff Jirsa >>>> >>>> >>>>> On Aug 29, 2018, at 5:02 PM, Jai Bheemsen Rao Dhanwada >>>>> <jaibheem...@gmail.com> wrote: >>>>> >>>>> Hello All, >>>>> >>>>> We are seeing some issue when we add more nodes to the cluster, where new >>>>> node bootstrap is not able to stream the entire metadata and fails to >>>>> bootstrap. Finally the process dies with OOM (java.lang.OutOfMemoryError: >>>>> Java heap space) >>>>> >>>>> But if I remove few nodes from the cluster we don't see this issue. >>>>> >>>>> Cassandra Version: 2.1.16 >>>>> # of KS and CF : 100, 3000 (approx) >>>>> # of DC: 8 >>>>> # of Vnodes per node: 256 >>>>> >>>>> Not sure what is causing this behavior, has any one come across this >>>>> scenario? >>>>> thanks in advance.