Thanks for quick reply. 1. I don't know what are the thresholds that I should look for. So, to save this back-and-forth, I'm attaching the cfstats output for the keyspace.
There is one table - daily_challenges - which shows compacted partition max bytes as ~460M and another one - daily_guest_logins - which shows compacted partition max bytes as ~36M. Can that be a problem? Here is the CQL schema for the daily_challenges column family: CREATE TABLE app_10001.daily_challenges ( segment_type text, date timestamp, user_id int, sess_id text, data text, deleted boolean, PRIMARY KEY (segment_type, date, user_id, sess_id) ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted); 2. I don't know - how do I check? As I mentioned, I just installed the dsc21 update from datastax's debian repo (ver 2.1.7). Really appreciate your help. Thanks, Kunal On 10 July 2015 at 23:33, Sebastian Estevez <sebastian.este...@datastax.com> wrote: > 1. You want to look at # of sstables in cfhistograms or in cfstats look at: > Compacted partition maximum bytes > Maximum live cells per slice > > 2) No, here's the env.sh from 3.0 which should work with some tweaks: > > https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh > > You'll at least have to modify the jamm version to what's in yours. I > think it's 2.5 > > > > All the best, > > > [image: datastax_logo.png] <http://www.datastax.com/> > > Sebastián Estévez > > Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com > > [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image: > facebook.png] <https://www.facebook.com/datastax> [image: twitter.png] > <https://twitter.com/datastax> [image: g+.png] > <https://plus.google.com/+Datastax/about> > <http://feeds.feedburner.com/datastax> > > <http://cassandrasummit-datastax.com/> > > DataStax is the fastest, most scalable distributed database technology, > delivering Apache Cassandra to the world’s most innovative enterprises. > Datastax is built to be agile, always-on, and predictably scalable to any > size. With more than 500 customers in 45 countries, DataStax is the > database technology and transactional backbone of choice for the worlds > most innovative companies such as Netflix, Adobe, Intuit, and eBay. > > On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar < > kgangakhed...@gmail.com> wrote: > >> Thanks, Sebastian. >> >> Couple of questions (I'm really new to cassandra): >> 1. How do I interpret the output of 'nodetool cfstats' to figure out the >> issues? Any documentation pointer on that would be helpful. >> >> 2. I'm primarily a python/c developer - so, totally clueless about JVM >> environment. So, please bare with me as I would need a lot of hand-holding. >> Should I just copy+paste the settings you gave and try to restart the >> failing cassandra server? >> >> Thanks, >> Kunal >> >> On 10 July 2015 at 22:35, Sebastian Estevez < >> sebastian.este...@datastax.com> wrote: >> >>> #1 You need more information. >>> >>> a) Take a look at your .hprof file (memory heap from the OOM) with an >>> introspection tool like jhat or visualvm or java flight recorder and see >>> what is using up your RAM. >>> >>> b) How big are your large rows (use nodetool cfstats on each node). If >>> your data model is bad, you are going to have to re-design it no matter >>> what. >>> >>> #2 As a possible workaround try using the G1GC allocator with the >>> settings from c* 3.0 instead of CMS. I've seen lots of success with it >>> lately (tl;dr G1GC is much simpler than CMS and almost as good as a finely >>> tuned CMS). *Note:* Use it with the latest Java 8 from Oracle. Do *not* >>> set the newgen size for G1 sets it dynamically: >>> >>> # min and max heap sizes should be set to the same value to avoid >>>> # stop-the-world GC pauses during resize, and so that we can lock the >>>> # heap in memory on startup to prevent any of it from being swapped >>>> # out. >>>> JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}" >>>> JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}" >>>> >>>> # Per-thread stack size. >>>> JVM_OPTS="$JVM_OPTS -Xss256k" >>>> >>>> # Use the Hotspot garbage-first collector. >>>> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC" >>>> >>>> # Have the JVM do less remembered set work during STW, instead >>>> # preferring concurrent GC. Reduces p99.9 latency. >>>> JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5" >>>> >>>> # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC. >>>> # Machines with > 10 cores may need additional threads. >>>> # Increase to <= full cores (do not count HT cores). >>>> #JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16" >>>> #JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16" >>>> >>>> # Main G1GC tunable: lowering the pause target will lower throughput >>>> and vise versa. >>>> # 200ms is the JVM default and lowest viable setting >>>> # 1000ms increases throughput. Keep it smaller than the timeouts in >>>> cassandra.yaml. >>>> JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500" >>>> # Do reference processing in parallel GC. >>>> JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled" >>>> >>>> # This may help eliminate STW. >>>> # The default in Hotspot 8u40 is 40%. >>>> #JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25" >>>> >>>> # For workloads that do large allocations, increasing the region >>>> # size may make things more efficient. Otherwise, let the JVM >>>> # set this automatically. >>>> #JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=32m" >>>> >>>> # Make sure all memory is faulted and zeroed on startup. >>>> # This helps prevent soft faults in containers and makes >>>> # transparent hugepage allocation more effective. >>>> JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch" >>>> >>>> # Biased locking does not benefit Cassandra. >>>> JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking" >>>> >>>> # Larger interned string table, for gossip's benefit (CASSANDRA-6410) >>>> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=1000003" >>>> >>>> # Enable thread-local allocation blocks and allow the JVM to >>>> automatically >>>> # resize them at runtime. >>>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB" >>>> >>>> # http://www.evanjones.ca/jvm-mmap-pause.html >>>> JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem" >>> >>> >>> All the best, >>> >>> >>> [image: datastax_logo.png] <http://www.datastax.com/> >>> >>> Sebastián Estévez >>> >>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com >>> >>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image: >>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png] >>> <https://twitter.com/datastax> [image: g+.png] >>> <https://plus.google.com/+Datastax/about> >>> <http://feeds.feedburner.com/datastax> >>> >>> <http://cassandrasummit-datastax.com/> >>> >>> DataStax is the fastest, most scalable distributed database technology, >>> delivering Apache Cassandra to the world’s most innovative enterprises. >>> Datastax is built to be agile, always-on, and predictably scalable to any >>> size. With more than 500 customers in 45 countries, DataStax is the >>> database technology and transactional backbone of choice for the worlds >>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>> >>> On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar < >>> kgangakhed...@gmail.com> wrote: >>> >>>> I upgraded my instance from 8GB to a 14GB one. >>>> Allocated 8GB to jvm heap in cassandra-env.sh. >>>> >>>> And now, it crashes even faster with an OOM.. >>>> >>>> Earlier, with 4GB heap, I could go upto ~90% replication completion (as >>>> reported by nodetool netstats); now, with 8GB heap, I cannot even get >>>> there. I've already restarted cassandra service 4 times with 8GB heap. >>>> >>>> No clue what's going on.. :( >>>> >>>> Kunal >>>> >>>> On 10 July 2015 at 17:45, Jack Krupansky <jack.krupan...@gmail.com> >>>> wrote: >>>> >>>>> You, and only you, are responsible for knowing your data and data >>>>> model. >>>>> >>>>> If columns per row or rows per partition can be large, then an 8GB >>>>> system is probably too small. But the real issue is that you need to keep >>>>> your partition size from getting too large. >>>>> >>>>> Generally, an 8GB system is okay, but only for reasonably-sized >>>>> partitions, like under 10MB. >>>>> >>>>> >>>>> -- Jack Krupansky >>>>> >>>>> On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar < >>>>> kgangakhed...@gmail.com> wrote: >>>>> >>>>>> I'm new to cassandra >>>>>> How do I find those out? - mainly, the partition params that you >>>>>> asked for. Others, I think I can figure out. >>>>>> >>>>>> We don't have any large objects/blobs in the column values - it's all >>>>>> textual, date-time, numeric and uuid data. >>>>>> >>>>>> We use cassandra to primarily store segmentation data - with segment >>>>>> type as partition key. That is again divided into two separate column >>>>>> families; but they have similar structure. >>>>>> >>>>>> Columns per row can be fairly large - each segment type as the row >>>>>> key and associated user ids and timestamp as column value. >>>>>> >>>>>> Thanks, >>>>>> Kunal >>>>>> >>>>>> On 10 July 2015 at 16:36, Jack Krupansky <jack.krupan...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> What does your data and data model look like - partition size, rows >>>>>>> per partition, number of columns per row, any large values/blobs in >>>>>>> column >>>>>>> values? >>>>>>> >>>>>>> You could run fine on an 8GB system, but only if your rows and >>>>>>> partitions are reasonably small. Any large partitions could blow you >>>>>>> away. >>>>>>> >>>>>>> -- Jack Krupansky >>>>>>> >>>>>>> On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar < >>>>>>> kgangakhed...@gmail.com> wrote: >>>>>>> >>>>>>>> Attaching the stack dump captured from the last OOM. >>>>>>>> >>>>>>>> Kunal >>>>>>>> >>>>>>>> On 10 July 2015 at 13:32, Kunal Gangakhedkar < >>>>>>>> kgangakhed...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Forgot to mention: the data size is not that big - it's barely >>>>>>>>> 10GB in all. >>>>>>>>> >>>>>>>>> Kunal >>>>>>>>> >>>>>>>>> On 10 July 2015 at 13:29, Kunal Gangakhedkar < >>>>>>>>> kgangakhed...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I have a 2 node setup on Azure (east us region) running Ubuntu >>>>>>>>>> server 14.04LTS. >>>>>>>>>> Both nodes have 8GB RAM. >>>>>>>>>> >>>>>>>>>> One of the nodes (seed node) died with OOM - so, I am trying to >>>>>>>>>> add a replacement node with same configuration. >>>>>>>>>> >>>>>>>>>> The problem is this new node also keeps dying with OOM - I've >>>>>>>>>> restarted the cassandra service like 8-10 times hoping that it would >>>>>>>>>> finish >>>>>>>>>> the replication. But it didn't help. >>>>>>>>>> >>>>>>>>>> The one node that is still up is happily chugging along. >>>>>>>>>> All nodes have similar configuration - with libjna installed. >>>>>>>>>> >>>>>>>>>> Cassandra is installed from datastax's debian repo - pkg: dsc21 >>>>>>>>>> version 2.1.7. >>>>>>>>>> I started off with the default configuration - i.e. the default >>>>>>>>>> cassandra-env.sh - which calculates the heap size automatically (1/4 >>>>>>>>>> * RAM >>>>>>>>>> = 2GB) >>>>>>>>>> >>>>>>>>>> But, that didn't help. So, I then tried to increase the heap to >>>>>>>>>> 4GB manually and restarted. It still keeps crashing. >>>>>>>>>> >>>>>>>>>> Any clue as to why it's happening? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Kunal >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
Keyspace: app_10001 Read Count: 15457730 Read Latency: 7.596830444508994 ms. Write Count: 746281660 Write Latency: 0.05196263861823966 ms. Pending Flushes: 0 Table (index): daily_challenges.idx_deleted SSTable count: 403 Space used (live): 478356941 Space used (total): 478356941 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.5551512820038972 Memtable cell count: 753 Memtable data size: 45766 Memtable switch count: 0 Local read count: 6089050 Local read latency: 11.951 ms Local write count: 15061824 Local write latency: 0.029 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 6928 Compacted partition minimum bytes: 87 Compacted partition maximum bytes: 107964792 Compacted partition mean bytes: 1169897 Average live cells per slice (last five minutes): 2.999939727872164 Maximum live cells per slice (last five minutes): 3.0 Average tombstones per slice (last five minutes): 0.22855026646192755 Maximum tombstones per slice (last five minutes): 25.0 Table: daily_challenges SSTable count: 405 Space used (live): 1362685827 Space used (total): 1367330566 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.26546186350427925 Memtable cell count: 1011 Memtable data size: 349470 Memtable switch count: 419 Local read count: 9368599 Local read latency: 4.761 ms Local write count: 7207110 Local write latency: 0.161 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 7600 Compacted partition minimum bytes: 14238 Compacted partition maximum bytes: 464228842 Compacted partition mean bytes: 13914141 Average live cells per slice (last five minutes): 0.7820884466072269 Maximum live cells per slice (last five minutes): 1.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Table: daily_guest_logins SSTable count: 51 Space used (live): 40426821 Space used (total): 40426821 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.3649429689714488 Memtable cell count: 641 Memtable data size: 18900 Memtable switch count: 51 Local read count: 0 Local read latency: NaN ms Local write count: 9966344 Local write latency: 0.039 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 976 Compacted partition minimum bytes: 771 Compacted partition maximum bytes: 36157190 Compacted partition mean bytes: 2433128 Average live cells per slice (last five minutes): 0.0 Maximum live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Table: daily_new_registrations SSTable count: 41 Space used (live): 9763317 Space used (total): 9763317 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.42244627946544405 Memtable cell count: 33 Memtable data size: 1300 Memtable switch count: 41 Local read count: 0 Local read latency: NaN ms Local write count: 527893 Local write latency: 0.038 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 816 Compacted partition minimum bytes: 104 Compacted partition maximum bytes: 12108970 Compacted partition mean bytes: 772371 Average live cells per slice (last five minutes): 0.0 Maximum live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Table: daily_social_logins SSTable count: 41 Space used (live): 8030277 Space used (total): 8030277 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.4059610117700528 Memtable cell count: 129 Memtable data size: 3500 Memtable switch count: 42 Local read count: 0 Local read latency: NaN ms Local write count: 2858562 Local write latency: 0.032 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 816 Compacted partition minimum bytes: 125 Compacted partition maximum bytes: 7007506 Compacted partition mean bytes: 567414 Average live cells per slice (last five minutes): 0.0 Maximum live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Table (index): event_stream.index_event_stream_user_country SSTable count: 5445 Space used (live): 1238089858 Space used (total): 1238089858 Space used by snapshots (total): 4870 SSTable Compression Ratio: 0.35452645745931183 Memtable cell count: 19841 Memtable data size: 806362 Memtable switch count: 0 Local read count: 0 Local read latency: NaN ms Local write count: 121085782 Local write latency: 0.020 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 429768 Compacted partition minimum bytes: 61 Compacted partition maximum bytes: 943127 Compacted partition mean bytes: 13233 Average live cells per slice (last five minutes): 0.0 Maximum live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Table (index): event_stream.index_event_stream_user_ip SSTable count: 5445 Space used (live): 1354666160 Space used (total): 1354666160 Space used by snapshots (total): 4910 SSTable Compression Ratio: 0.36008613590700006 Memtable cell count: 19831 Memtable data size: 802164 Memtable switch count: 0 Local read count: 0 Local read latency: NaN ms Local write count: 119946624 Local write latency: 0.012 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 3235976 Compacted partition minimum bytes: 73 Compacted partition maximum bytes: 943127 Compacted partition mean bytes: 1507 Average live cells per slice (last five minutes): 0.0 Maximum live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Table: event_stream SSTable count: 5445 Space used (live): 7788128519 Space used (total): 7788128519 Space used by snapshots (total): 5441 SSTable Compression Ratio: 0.21874022831548764 Memtable cell count: 75433 Memtable data size: 7676755 Memtable switch count: 5446 Local read count: 81 Local read latency: 809.210 ms Local write count: 59563652 Local write latency: 0.253 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 174224 Compacted partition minimum bytes: 311 Compacted partition maximum bytes: 8409007 Compacted partition mean bytes: 577868 Average live cells per slice (last five minutes): 87.42857142857143 Maximum live cells per slice (last five minutes): 102.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Table (index): user_event_map.index_user_event_map_evt_type SSTable count: 3166 Space used (live): 1149599939 Space used (total): 1149599939 Space used by snapshots (total): 4956 SSTable Compression Ratio: 0.46269090321350587 Memtable cell count: 19408 Memtable data size: 546700 Memtable switch count: 0 Local read count: 0 Local read latency: NaN ms Local write count: 117988685 Local write latency: 0.019 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 101304 Compacted partition minimum bytes: 61 Compacted partition maximum bytes: 785939 Compacted partition mean bytes: 68590 Average live cells per slice (last five minutes): 0.0 Maximum live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Table (index): user_event_map.index_user_event_map_user_country SSTable count: 3166 Space used (live): 1084074127 Space used (total): 1084074127 Space used by snapshots (total): 4837 SSTable Compression Ratio: 0.43881106861515695 Memtable cell count: 19223 Memtable data size: 542584 Memtable switch count: 0 Local read count: 0 Local read latency: NaN ms Local write count: 116460555 Local write latency: 0.014 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 272200 Compacted partition minimum bytes: 51 Compacted partition maximum bytes: 654949 Compacted partition mean bytes: 14641 Average live cells per slice (last five minutes): 0.0 Maximum live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Table (index): user_event_map.index_user_event_map_user_ip SSTable count: 3166 Space used (live): 1175416923 Space used (total): 1175416923 Space used by snapshots (total): 4878 SSTable Compression Ratio: 0.4422075743233993 Memtable cell count: 19193 Memtable data size: 538664 Memtable switch count: 0 Local read count: 0 Local read latency: NaN ms Local write count: 116060301 Local write latency: 0.011 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 2515184 Compacted partition minimum bytes: 61 Compacted partition maximum bytes: 654949 Compacted partition mean bytes: 1375 Average live cells per slice (last five minutes): 0.0 Maximum live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Table: user_event_map SSTable count: 3166 Space used (live): 3513898771 Space used (total): 3513898771 Space used by snapshots (total): 5098 SSTable Compression Ratio: 0.282546418560261 Memtable cell count: 39028 Memtable data size: 2882597 Memtable switch count: 3167 Local read count: 0 Local read latency: NaN ms Local write count: 59554382 Local write latency: 0.219 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 3791224 Compacted partition minimum bytes: 104 Compacted partition maximum bytes: 3379391 Compacted partition mean bytes: 4442 Average live cells per slice (last five minutes): 0.0 Maximum live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 ----------------