Re: Cassandra OOM on joining existing ring

Kunal Gangakhedkar Fri, 10 Jul 2015 11:45:07 -0700

And here is my cassandra-env.sh
https://gist.github.com/kunalg/2c092cb2450c62be9a20


Kunal

On 11 July 2015 at 00:04, Kunal Gangakhedkar <kgangakhed...@gmail.com>
wrote:

> From jhat output, top 10 entries for "Instance Count for All Classes
> (excluding platform)" shows:
>
> 2088223 instances of class org.apache.cassandra.db.BufferCell
> 1983245 instances of class
> org.apache.cassandra.db.composites.CompoundSparseCellName
> 1885974 instances of class
> org.apache.cassandra.db.composites.CompoundDenseCellName
> 630000 instances of class
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
> 503687 instances of class org.apache.cassandra.db.BufferDeletedCell
> 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier
> 101800 instances of class org.apache.cassandra.utils.concurrent.Ref
> 101800 instances of class org.apache.cassandra.utils.concurrent.Ref$State
> 90704 instances of class
> org.apache.cassandra.utils.concurrent.Ref$GlobalState
> 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey
>
> At the bottom of the page, it shows:
> Total of 8739510 instances occupying 193607512 bytes.
> JFYI.
>
> Kunal
>
> On 10 July 2015 at 23:49, Kunal Gangakhedkar <kgangakhed...@gmail.com>
> wrote:
>
>> Thanks for quick reply.
>>
>> 1. I don't know what are the thresholds that I should look for. So, to
>> save this back-and-forth, I'm attaching the cfstats output for the keyspace.
>>
>> There is one table - daily_challenges - which shows compacted partition
>> max bytes as ~460M and another one - daily_guest_logins - which shows
>> compacted partition max bytes as ~36M.
>>
>> Can that be a problem?
>> Here is the CQL schema for the daily_challenges column family:
>>
>> CREATE TABLE app_10001.daily_challenges (
>>     segment_type text,
>>     date timestamp,
>>     user_id int,
>>     sess_id text,
>>     data text,
>>     deleted boolean,
>>     PRIMARY KEY (segment_type, date, user_id, sess_id)
>> ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC)
>>     AND bloom_filter_fp_chance = 0.01
>>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>     AND comment = ''
>>     AND compaction = {'min_threshold': '4', 'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32'}
>>     AND compression = {'sstable_compression':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>     AND dclocal_read_repair_chance = 0.1
>>     AND default_time_to_live = 0
>>     AND gc_grace_seconds = 864000
>>     AND max_index_interval = 2048
>>     AND memtable_flush_period_in_ms = 0
>>     AND min_index_interval = 128
>>     AND read_repair_chance = 0.0
>>     AND speculative_retry = '99.0PERCENTILE';
>>
>> CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted);
>>
>>
>> 2. I don't know - how do I check? As I mentioned, I just installed the
>> dsc21 update from datastax's debian repo (ver 2.1.7).
>>
>> Really appreciate your help.
>>
>> Thanks,
>> Kunal
>>
>> On 10 July 2015 at 23:33, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> 1. You want to look at # of sstables in cfhistograms or in cfstats look
>>> at:
>>> Compacted partition maximum bytes
>>> Maximum live cells per slice
>>>
>>> 2) No, here's the env.sh from 3.0 which should work with some tweaks:
>>>
>>> https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh
>>>
>>> You'll at least have to modify the jamm version to what's in yours. I
>>> think it's 2.5
>>>
>>>
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax>
>>>
>>> <http://cassandrasummit-datastax.com/>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar <
>>> kgangakhed...@gmail.com> wrote:
>>>
>>>> Thanks, Sebastian.
>>>>
>>>> Couple of questions (I'm really new to cassandra):
>>>> 1. How do I interpret the output of 'nodetool cfstats' to figure out
>>>> the issues? Any documentation pointer on that would be helpful.
>>>>
>>>> 2. I'm primarily a python/c developer - so, totally clueless about JVM
>>>> environment. So, please bare with me as I would need a lot of hand-holding.
>>>> Should I just copy+paste the settings you gave and try to restart the
>>>> failing cassandra server?
>>>>
>>>> Thanks,
>>>> Kunal
>>>>
>>>> On 10 July 2015 at 22:35, Sebastian Estevez <
>>>> sebastian.este...@datastax.com> wrote:
>>>>
>>>>> #1 You need more information.
>>>>>
>>>>> a) Take a look at your .hprof file (memory heap from the OOM) with an
>>>>> introspection tool like jhat or visualvm or java flight recorder and see
>>>>> what is using up your RAM.
>>>>>
>>>>> b) How big are your large rows (use nodetool cfstats on each node). If
>>>>> your data model is bad, you are going to have to re-design it no matter
>>>>> what.
>>>>>
>>>>> #2 As a possible workaround try using the G1GC allocator with the
>>>>> settings from c* 3.0 instead of CMS. I've seen lots of success with it
>>>>> lately (tl;dr G1GC is much simpler than CMS and almost as good as a finely
>>>>> tuned CMS). *Note:* Use it with the latest Java 8 from Oracle. Do
>>>>> *not* set the newgen size for G1 sets it dynamically:
>>>>>
>>>>> # min and max heap sizes should be set to the same value to avoid
>>>>>> # stop-the-world GC pauses during resize, and so that we can lock the
>>>>>> # heap in memory on startup to prevent any of it from being swapped
>>>>>> # out.
>>>>>> JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
>>>>>> JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
>>>>>>
>>>>>> # Per-thread stack size.
>>>>>> JVM_OPTS="$JVM_OPTS -Xss256k"
>>>>>>
>>>>>> # Use the Hotspot garbage-first collector.
>>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
>>>>>>
>>>>>> # Have the JVM do less remembered set work during STW, instead
>>>>>> # preferring concurrent GC. Reduces p99.9 latency.
>>>>>> JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
>>>>>>
>>>>>> # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC.
>>>>>> # Machines with > 10 cores may need additional threads.
>>>>>> # Increase to <= full cores (do not count HT cores).
>>>>>> #JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16"
>>>>>> #JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16"
>>>>>>
>>>>>> # Main G1GC tunable: lowering the pause target will lower throughput
>>>>>> and vise versa.
>>>>>> # 200ms is the JVM default and lowest viable setting
>>>>>> # 1000ms increases throughput. Keep it smaller than the timeouts in
>>>>>> cassandra.yaml.
>>>>>> JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
>>>>>> # Do reference processing in parallel GC.
>>>>>> JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"
>>>>>>
>>>>>> # This may help eliminate STW.
>>>>>> # The default in Hotspot 8u40 is 40%.
>>>>>> #JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
>>>>>>
>>>>>> # For workloads that do large allocations, increasing the region
>>>>>> # size may make things more efficient. Otherwise, let the JVM
>>>>>> # set this automatically.
>>>>>> #JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=32m"
>>>>>>
>>>>>> # Make sure all memory is faulted and zeroed on startup.
>>>>>> # This helps prevent soft faults in containers and makes
>>>>>> # transparent hugepage allocation more effective.
>>>>>> JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch"
>>>>>>
>>>>>> # Biased locking does not benefit Cassandra.
>>>>>> JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
>>>>>>
>>>>>> # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
>>>>>> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=1000003"
>>>>>>
>>>>>> # Enable thread-local allocation blocks and allow the JVM to
>>>>>> automatically
>>>>>> # resize them at runtime.
>>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB"
>>>>>>
>>>>>> # http://www.evanjones.ca/jvm-mmap-pause.html
>>>>>> JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"
>>>>>
>>>>>
>>>>> All the best,
>>>>>
>>>>>
>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>
>>>>> Sebastián Estévez
>>>>>
>>>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>>>
>>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>>>> <https://twitter.com/datastax> [image: g+.png]
>>>>> <https://plus.google.com/+Datastax/about>
>>>>> <http://feeds.feedburner.com/datastax>
>>>>>
>>>>> <http://cassandrasummit-datastax.com/>
>>>>>
>>>>> DataStax is the fastest, most scalable distributed database
>>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>>> DataStax
>>>>> is the database technology and transactional backbone of choice for the
>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>>>
>>>>> On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar <
>>>>> kgangakhed...@gmail.com> wrote:
>>>>>
>>>>>> I upgraded my instance from 8GB to a 14GB one.
>>>>>> Allocated 8GB to jvm heap in cassandra-env.sh.
>>>>>>
>>>>>> And now, it crashes even faster with an OOM..
>>>>>>
>>>>>> Earlier, with 4GB heap, I could go upto ~90% replication completion
>>>>>> (as reported by nodetool netstats); now, with 8GB heap, I cannot even get
>>>>>> there. I've already restarted cassandra service 4 times with 8GB heap.
>>>>>>
>>>>>> No clue what's going on.. :(
>>>>>>
>>>>>> Kunal
>>>>>>
>>>>>> On 10 July 2015 at 17:45, Jack Krupansky <jack.krupan...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> You, and only you, are responsible for knowing your data and data
>>>>>>> model.
>>>>>>>
>>>>>>> If columns per row or rows per partition can be large, then an 8GB
>>>>>>> system is probably too small. But the real issue is that you need to 
>>>>>>> keep
>>>>>>> your partition size from getting too large.
>>>>>>>
>>>>>>> Generally, an 8GB system is okay, but only for reasonably-sized
>>>>>>> partitions, like under 10MB.
>>>>>>>
>>>>>>>
>>>>>>> -- Jack Krupansky
>>>>>>>
>>>>>>> On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar <
>>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I'm new to cassandra
>>>>>>>> How do I find those out? - mainly, the partition params that you
>>>>>>>> asked for. Others, I think I can figure out.
>>>>>>>>
>>>>>>>> We don't have any large objects/blobs in the column values - it's
>>>>>>>> all textual, date-time, numeric and uuid data.
>>>>>>>>
>>>>>>>> We use cassandra to primarily store segmentation data - with
>>>>>>>> segment type as partition key. That is again divided into two separate
>>>>>>>> column families; but they have similar structure.
>>>>>>>>
>>>>>>>> Columns per row can be fairly large - each segment type as the row
>>>>>>>> key and associated user ids and timestamp as column value.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Kunal
>>>>>>>>
>>>>>>>> On 10 July 2015 at 16:36, Jack Krupansky <jack.krupan...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> What does your data and data model look like - partition size,
>>>>>>>>> rows per partition, number of columns per row, any large values/blobs 
>>>>>>>>> in
>>>>>>>>> column values?
>>>>>>>>>
>>>>>>>>> You could run fine on an 8GB system, but only if your rows and
>>>>>>>>> partitions are reasonably small. Any large partitions could blow you 
>>>>>>>>> away.
>>>>>>>>>
>>>>>>>>> -- Jack Krupansky
>>>>>>>>>
>>>>>>>>> On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar <
>>>>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Attaching the stack dump captured from the last OOM.
>>>>>>>>>>
>>>>>>>>>> Kunal
>>>>>>>>>>
>>>>>>>>>> On 10 July 2015 at 13:32, Kunal Gangakhedkar <
>>>>>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Forgot to mention: the data size is not that big - it's barely
>>>>>>>>>>> 10GB in all.
>>>>>>>>>>>
>>>>>>>>>>> Kunal
>>>>>>>>>>>
>>>>>>>>>>> On 10 July 2015 at 13:29, Kunal Gangakhedkar <
>>>>>>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I have a 2 node setup on Azure (east us region) running Ubuntu
>>>>>>>>>>>> server 14.04LTS.
>>>>>>>>>>>> Both nodes have 8GB RAM.
>>>>>>>>>>>>
>>>>>>>>>>>> One of the nodes (seed node) died with OOM - so, I am trying to
>>>>>>>>>>>> add a replacement node with same configuration.
>>>>>>>>>>>>
>>>>>>>>>>>> The problem is this new node also keeps dying with OOM - I've
>>>>>>>>>>>> restarted the cassandra service like 8-10 times hoping that it 
>>>>>>>>>>>> would finish
>>>>>>>>>>>> the replication. But it didn't help.
>>>>>>>>>>>>
>>>>>>>>>>>> The one node that is still up is happily chugging along.
>>>>>>>>>>>> All nodes have similar configuration - with libjna installed.
>>>>>>>>>>>>
>>>>>>>>>>>> Cassandra is installed from datastax's debian repo - pkg: dsc21
>>>>>>>>>>>> version 2.1.7.
>>>>>>>>>>>> I started off with the default configuration - i.e. the default
>>>>>>>>>>>> cassandra-env.sh - which calculates the heap size automatically 
>>>>>>>>>>>> (1/4 * RAM
>>>>>>>>>>>> = 2GB)
>>>>>>>>>>>>
>>>>>>>>>>>> But, that didn't help. So, I then tried to increase the heap to
>>>>>>>>>>>> 4GB manually and restarted. It still keeps crashing.
>>>>>>>>>>>>
>>>>>>>>>>>> Any clue as to why it's happening?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Kunal
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Cassandra OOM on joining existing ring

Reply via email to