This is an intricate matter, I cannot say for sure what are good parameters from the wrong ones, too many things changed at once.
However there’s many things to consider - What is your OS ? - Do your nodes have SSDs or mechanical drives ? How many cores do you have ? - Is it the CPUs or IOs that are overloaded ? - What is the write request/s per node and cluster wide ? - What is the compaction strategy of the tables you are writing into ? - Are you using LOGGED BATCH statement. With heavy writes, it is *NOT* recommend to use LOGGED BATCH statements. In our 2.0.14 cluster we have experimented node unavailability due to long Full GC pauses. We discovered bogus legacy data, a single outlier was so wrong that it updated hundred thousand time the same CQL rows with duplicate data. Given the tables we were writing to were configured to use LCS, this resulted in keeping Memtables in memory long enough to promote them in the old generation (the MaxTenuringThreshold default is 1). Handling this data proved to be the thing to fix, with default GC settings the cluster (10 nodes) handle 39 write requests/s. Note Memtables are allocated on heap with 2.0.x. With 2.1.x they will be allocated off-heap. -- Brice On Tue, Apr 21, 2015 at 5:12 PM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > Any suggestions or comments on this one?? > > Thanks > Anuj Wadhera > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > ------------------------------ > *From*:"Anuj Wadehra" <anujw_2...@yahoo.co.in> > *Date*:Mon, 20 Apr, 2015 at 11:51 pm > *Subject*:Re: Handle Write Heavy Loads in Cassandra 2.0.3 > > Small correction: we are making writes in 5 cf an reading frm one at high > speeds. > > > Thanks > Anuj Wadehra > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > ------------------------------ > *From*:"Anuj Wadehra" <anujw_2...@yahoo.co.in> > *Date*:Mon, 20 Apr, 2015 at 7:53 pm > *Subject*:Handle Write Heavy Loads in Cassandra 2.0.3 > > Hi, > > Recently, we discovered that millions of mutations were getting dropped > on our cluster. Eventually, we solved this problem by increasing the value > of memtable_flush_writers from 1 to 3. We usually write 3 CFs > simultaneously an one of them has 4 Secondary Indexes. > > New changes also include: > concurrent_compactors: 12 (earlier it was default) > compaction_throughput_mb_per_sec: 32(earlier it was default) > in_memory_compaction_limit_in_mb: 400 ((earlier it was default 64) > memtable_flush_writers: 3 (earlier 1) > > After, making above changes, our write heavy workload scenarios started > giving "promotion failed" exceptions in gc logs. > > We have done JVM tuning and Cassandra config changes to solve this: > > MAX_HEAP_SIZE="12G" (Increased Heap to from 8G to reduce fragmentation) > HEAP_NEWSIZE="3G" > > JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=2" (We observed that even at > SurvivorRatio=4, our survivor space was getting 100% utilized under heavy > write load and we thought that minor collections were directly promoting > objects to Tenured generation) > > JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=20" (Lots of objects were > moving from Eden to Tenured on each minor collection..may be related to > medium life objects related to Memtables and compactions as suggested by > heapdump) > > JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=20" > JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions" > JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity" > JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs" > JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768" > JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark" > JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=30000" > JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=2000" //though it's default value > JVM_OPTS="$JVM_OPTS -XX:+CMSEdenChunksRecordAlways" > JVM_OPTS="$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled" > JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking" > JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70" (to avoid > concurrent failures we reduced value) > > Cassandra config: > compaction_throughput_mb_per_sec: 24 > memtable_total_space_in_mb: 1000 (to make memtable flush frequent.default > is 1/4 heap which creates more long lived objects) > > Questions: > 1. Why increasing memtable_flush_writers and > in_memory_compaction_limit_in_mb caused promotion failures in JVM? Does > more memtable_flush_writers mean more memtables in memory? > > 2. Still, objects are getting promoted at high speed to Tenured space. CMS > is running on Old gen every 4-5 minutes under heavy write load. Around > 750+ minor collections of upto 300ms happened in 45 mins. Do you see any > problems with new JVM tuning and Cassandra config? Is the justification > given against those changes sounds logical? Any suggestions? > 3. What is the best practice for reducing heap fragmentation/promotion > failure when allocation and promotion rates are high? > > Thanks > Anuj > > > > >