I am trying to recapture again... but my first attempt, it does not look like these numbers vary all that much, from when the cluster reboots, till when the nodes start crashing:
[root@avesterra-prod-1 ~]# nodetool -u cassandra -pw '......' tablestats| grep "Bloom filter space used:" Bloom filter space used: 2041877200 Bloom filter space used: 0 Bloom filter space used: 1936840 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 352 Bloom filter space used: 0 Bloom filter space used: 48 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 48 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 72 Bloom filter space used: 720 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 32 Bloom filter space used: 56 Bloom filter space used: 0 Bloom filter space used: 32 Bloom filter space used: 32 Bloom filter space used: 56 Bloom filter space used: 56 Bloom filter space used: 32 Bloom filter space used: 32 Bloom filter space used: 32 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 0 Bloom filter space used: 0 [root@avesterra-prod-1 ~]# On Mon, Mar 14, 2016 at 4:43 PM, Paulo Motta <pauloricard...@gmail.com> wrote: > Sorry, the command is actually nodetool tablestats and you should watch > the bloom filter size or similar metrics. > > 2016-03-14 17:35 GMT-03:00 Mohamed Lrhazi <mohamed.lrh...@georgetown.edu>: > >> Hi Paulo, >> >> Which metric should I watch for this ? >> >> [root@avesterra-prod-1 ~]# rpm -qa| grep datastax >> datastax-ddc-3.2.1-1.noarch >> datastax-ddc-tools-3.2.1-1.noarch >> [root@avesterra-prod-1 ~]# cassandra -v >> 3.2.1 >> [root@avesterra-prod-1 ~]# >> >> [root@avesterra-prod-1 ~]# nodetool -u cassandra -pw '########' tpstats >> >> >> Pool Name Active Pending Completed Blocked >> All time blocked >> MutationStage 0 0 13609 0 >> 0 >> ViewMutationStage 0 0 0 0 >> 0 >> ReadStage 0 0 0 0 >> 0 >> RequestResponseStage 0 0 8 0 >> 0 >> ReadRepairStage 0 0 0 0 >> 0 >> CounterMutationStage 0 0 0 0 >> 0 >> MiscStage 0 0 0 0 >> 0 >> CompactionExecutor 1 1 17556 0 >> 0 >> MemtableReclaimMemory 0 0 38 0 >> 0 >> PendingRangeCalculator 0 0 8 0 >> 0 >> GossipStage 0 0 118094 0 >> 0 >> SecondaryIndexManagement 0 0 0 0 >> 0 >> HintsDispatcher 0 0 0 0 >> 0 >> MigrationStage 0 0 0 0 >> 0 >> MemtablePostFlush 0 0 55 0 >> 0 >> PerDiskMemtableFlushWriter_0 0 0 38 0 >> 0 >> ValidationExecutor 0 0 0 0 >> 0 >> Sampler 0 0 0 0 >> 0 >> MemtableFlushWriter 0 0 38 0 >> 0 >> InternalResponseStage 0 0 0 0 >> 0 >> AntiEntropyStage 0 0 0 0 >> 0 >> CacheCleanupExecutor 0 0 0 0 >> 0 >> Native-Transport-Requests 0 0 0 0 >> 0 >> >> Message type Dropped >> READ 0 >> RANGE_SLICE 0 >> _TRACE 0 >> HINT 0 >> MUTATION 0 >> COUNTER_MUTATION 0 >> BATCH_STORE 0 >> BATCH_REMOVE 0 >> REQUEST_RESPONSE 0 >> PAGED_RANGE 0 >> READ_REPAIR 0 >> [root@avesterra-prod-1 ~]# >> >> >> >> >> Thanks a lot, >> Mohamed. >> >> >> >> On Mon, Mar 14, 2016 at 8:22 AM, Paulo Motta <pauloricard...@gmail.com> >> wrote: >> >>> Can you check with nodetool tpstats if bloom filter mem space >>> utilization is very large/ramping up before the node gets killed? You could >>> be hitting CASSANDRA-11344. >>> >>> 2016-03-12 19:43 GMT-03:00 Mohamed Lrhazi <mohamed.lrh...@georgetown.edu >>> >: >>> >>>> In my case, all nodes seem to be constantly logging messages like these: >>>> >>>> DEBUG [GossipStage:1] 2016-03-12 17:41:19,123 FailureDetector.java:456 >>>> - Ignoring interval time of 2000928319 for /10.212.18.170 >>>> >>>> What does that mean? >>>> >>>> Thanks a lot, >>>> Mohamed. >>>> >>>> >>>> On Sat, Mar 12, 2016 at 5:39 PM, Mohamed Lrhazi < >>>> mohamed.lrh...@georgetown.edu> wrote: >>>> >>>>> Oh wow, similar behavior with different version all together!! >>>>> >>>>> On Sat, Mar 12, 2016 at 5:28 PM, ssiv...@gmail.com <ssiv...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, I'll duplicate here my email with the same issue >>>>>> >>>>>> " >>>>>> >>>>>> >>>>>> *I have 7 nodes of C* v2.2.5 running on CentOS 7 and using jemalloc >>>>>> for dynamic storage allocation. Use only one keyspace and one table with >>>>>> Leveled compaction strategy. I've loaded ~500 GB of data into the cluster >>>>>> with replication factor equals to 3 and waiting until compaction is >>>>>> finished. But during compaction each of the C* nodes allocates all the >>>>>> available memory (~128GB) and just stops its process. This is a known >>>>>> bug ? >>>>>> *" >>>>>> >>>>>> >>>>>> On 03/13/2016 12:56 AM, Mohamed Lrhazi wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> We installed Datastax community edition, on 8 nodes, RHEL7. We >>>>>> inserted some 7 billion rows into a pretty simple table. the inserts seem >>>>>> to have completed without issues. but ever since, we find that the nodes >>>>>> reliably run out of RAM after few hours, without any user activity at >>>>>> all. >>>>>> No reads nor write are sent at all. What should we look for to try and >>>>>> identify root cause? >>>>>> >>>>>> >>>>>> [root@avesterra-prod-1 ~]# cat /etc/redhat-release >>>>>> Red Hat Enterprise Linux Server release 7.2 (Maipo) >>>>>> [root@avesterra-prod-1 ~]# rpm -qa| grep datastax >>>>>> datastax-ddc-3.2.1-1.noarch >>>>>> datastax-ddc-tools-3.2.1-1.noarch >>>>>> [root@avesterra-prod-1 ~]# >>>>>> >>>>>> The nodes had 8 GB RAM, which we doubled twice and now are trying >>>>>> with 40GB... they still manage to consume it all and cause oom_killer to >>>>>> kick in. >>>>>> >>>>>> Pretty much all the settings are the default ones the installation >>>>>> created. >>>>>> >>>>>> Thanks, >>>>>> Mohamed. >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks, >>>>>> Serj >>>>>> >>>>>> >>>>> >>>> >>> >> >