Hi Mike, PFA the details you asked for: and some others if that helps: we are using jvm params -Xms8G -Xmx8G
MAX_HEAP_SIZE: & HEAP_NEWSIZE: is not being set and possibly calculated by calculate_heap_sizes function. (And we are using default calculations): here are the results, pls correct me if im wrong : system_memory_in_mb : 64544 system_cpu_cores : 16 for MAX_HEAP_SIZE: # set max heap size based on the following # max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB)) # calculate 1/2 ram and cap to 1024MB # calculate 1/4 ram and cap to 8192MB # pick the max By this I can figure out that MAX_HEAP_SIZE is 8GB - (From the first case & third case) max_sensible_yg_per_core_in_mb="100" max_sensible_yg_in_mb=`expr $max_sensible_yg_per_core_in_mb "*" $system_cpu_cores` - 100* 16 = 1600 MB desired_yg_in_mb=`expr $max_heap_size_in_mb / 4 -------That comes out to be- 8GB/4 = 2GB if [ "$desired_yg_in_mb" -gt "$max_sensible_yg_in_mb" ] then HEAP_NEWSIZE="${max_sensible_yg_in_mb}M" else HEAP_NEWSIZE="${desired_yg_in_mb}M" fi That should set HEAP_NEWSIZE to 1600MB by first case. memtable_allocation_type: heap_buffers memtable_cleanup_threshold- we are using default: # memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers + 1) # memtable_cleanup_threshold: 0.11 memtable_flush_writers - default (2) we can increase this as we are using SSD with IOPS of around 300/s memtable_heap_space_in_mb - default values # memtable_heap_space_in_mb: 2048 # memtable_offheap_space_in_mb: 2048 We are using G1 garbage collector and jdk1.8.0_45 Best Regards, On Sun, May 29, 2016 at 5:07 PM, Mike Yeap <wkk1...@gmail.com> wrote: > Hi Bhuvan, how big are your current commit logs in the failed node, and > what are the sizes MAX_HEAP_SIZE and HEAP_NEWSIZE? > > Also the values of following properties in cassandra.yaml?? > > memtable_allocation_type > memtable_cleanup_threshold > memtable_flush_writers > memtable_heap_space_in_mb > memtable_offheap_space_in_mb > > > Regards, > Mike Yeap > > > > On Sun, May 29, 2016 at 6:18 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Hi, >> >> We are running a 6 Node cluster in 2 DC on DSC 3.0.3, with 3 Node each. >> One of the node was showing UNREACHABLE on other nodes in nodetool >> describecluster and on that node it was showing all others UNREACHABLE and >> as a measure we restarted the node. >> >> But on doing that it is stuck possibly at with these messages in >> system.log: >> >> DEBUG [SlabPoolCleaner] 2016-05-29 14:07:28,156 >> ColumnFamilyStore.java:829 - Enqueuing flush of batches: 226784704 (11%) >> on-heap, 0 (0%) off-heap >> DEBUG [main] 2016-05-29 14:07:28,576 CommitLogReplayer.java:415 - >> Replaying /commitlog/data/CommitLog-6-1464508993391.log (CL version 6, >> messaging version 10, compression null) >> DEBUG [main] 2016-05-29 14:07:28,781 ColumnFamilyStore.java:829 - >> Enqueuing flush of batches: 207333510 (10%) on-heap, 0 (0%) off-heap >> >> MemtablePostFlush / MemtableFlushWriter stages where it is stuck with >> pending messages. >> This has been the status of them as per *nodetool tpstats *for long. >> MemtablePostFlush Active - 1 pending - 52 >> completed - 16 >> MemtableFlushWriter Active - 2 pending - 13 >> completed - 15 >> >> >> We restarted the node by setting log level to TRACE but in vain. What >> could be a possible contingency plan in such a scenario? >> >> Best Regards, >> Bhuvan >> >> >