Re: Cassandra OOM on joining existing ring

Sebastian Estevez Fri, 10 Jul 2015 10:07:26 -0700

#1 You need more information.

a) Take a look at your .hprof file (memory heap from the OOM) with an
introspection tool like jhat or visualvm or java flight recorder and see
what is using up your RAM.


b) How big are your large rows (use nodetool cfstats on each node). If your
data model is bad, you are going to have to re-design it no matter what.

#2 As a possible workaround try using the G1GC allocator with the settings
from c* 3.0 instead of CMS. I've seen lots of success with it lately (tl;dr
G1GC is much simpler than CMS and almost as good as a finely tuned CMS).
*Note:* Use it with the latest Java 8 from Oracle. Do *not* set the newgen
size for G1 sets it dynamically:

# min and max heap sizes should be set to the same value to avoid
> # stop-the-world GC pauses during resize, and so that we can lock the
> # heap in memory on startup to prevent any of it from being swapped
> # out.
> JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
> JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
>
> # Per-thread stack size.
> JVM_OPTS="$JVM_OPTS -Xss256k"
>
> # Use the Hotspot garbage-first collector.
> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
>
> # Have the JVM do less remembered set work during STW, instead
> # preferring concurrent GC. Reduces p99.9 latency.
> JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
>
> # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC.
> # Machines with > 10 cores may need additional threads.
> # Increase to <= full cores (do not count HT cores).
> #JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16"
> #JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16"
>
> # Main G1GC tunable: lowering the pause target will lower throughput and
> vise versa.
> # 200ms is the JVM default and lowest viable setting
> # 1000ms increases throughput. Keep it smaller than the timeouts in
> cassandra.yaml.
> JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
> # Do reference processing in parallel GC.
> JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"
>
> # This may help eliminate STW.
> # The default in Hotspot 8u40 is 40%.
> #JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
>
> # For workloads that do large allocations, increasing the region
> # size may make things more efficient. Otherwise, let the JVM
> # set this automatically.
> #JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=32m"
>
> # Make sure all memory is faulted and zeroed on startup.
> # This helps prevent soft faults in containers and makes
> # transparent hugepage allocation more effective.
> JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch"
>
> # Biased locking does not benefit Cassandra.
> JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
>
> # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=1000003"
>
> # Enable thread-local allocation blocks and allow the JVM to automatically
> # resize them at runtime.
> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB"
>
> # http://www.evanjones.ca/jvm-mmap-pause.html
> JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"


All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>

<http://cassandrasummit-datastax.com/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar <
kgangakhed...@gmail.com> wrote:

> I upgraded my instance from 8GB to a 14GB one.
> Allocated 8GB to jvm heap in cassandra-env.sh.
>
> And now, it crashes even faster with an OOM..
>
> Earlier, with 4GB heap, I could go upto ~90% replication completion (as
> reported by nodetool netstats); now, with 8GB heap, I cannot even get
> there. I've already restarted cassandra service 4 times with 8GB heap.
>
> No clue what's going on.. :(
>
> Kunal
>
> On 10 July 2015 at 17:45, Jack Krupansky <jack.krupan...@gmail.com> wrote:
>
>> You, and only you, are responsible for knowing your data and data model.
>>
>> If columns per row or rows per partition can be large, then an 8GB system
>> is probably too small. But the real issue is that you need to keep your
>> partition size from getting too large.
>>
>> Generally, an 8GB system is okay, but only for reasonably-sized
>> partitions, like under 10MB.
>>
>>
>> -- Jack Krupansky
>>
>> On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar <
>> kgangakhed...@gmail.com> wrote:
>>
>>> I'm new to cassandra
>>> How do I find those out? - mainly, the partition params that you asked
>>> for. Others, I think I can figure out.
>>>
>>> We don't have any large objects/blobs in the column values - it's all
>>> textual, date-time, numeric and uuid data.
>>>
>>> We use cassandra to primarily store segmentation data - with segment
>>> type as partition key. That is again divided into two separate column
>>> families; but they have similar structure.
>>>
>>> Columns per row can be fairly large - each segment type as the row key
>>> and associated user ids and timestamp as column value.
>>>
>>> Thanks,
>>> Kunal
>>>
>>> On 10 July 2015 at 16:36, Jack Krupansky <jack.krupan...@gmail.com>
>>> wrote:
>>>
>>>> What does your data and data model look like - partition size, rows per
>>>> partition, number of columns per row, any large values/blobs in column
>>>> values?
>>>>
>>>> You could run fine on an 8GB system, but only if your rows and
>>>> partitions are reasonably small. Any large partitions could blow you away.
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar <
>>>> kgangakhed...@gmail.com> wrote:
>>>>
>>>>> Attaching the stack dump captured from the last OOM.
>>>>>
>>>>> Kunal
>>>>>
>>>>> On 10 July 2015 at 13:32, Kunal Gangakhedkar <kgangakhed...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Forgot to mention: the data size is not that big - it's barely 10GB
>>>>>> in all.
>>>>>>
>>>>>> Kunal
>>>>>>
>>>>>> On 10 July 2015 at 13:29, Kunal Gangakhedkar <kgangakhed...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have a 2 node setup on Azure (east us region) running Ubuntu
>>>>>>> server 14.04LTS.
>>>>>>> Both nodes have 8GB RAM.
>>>>>>>
>>>>>>> One of the nodes (seed node) died with OOM - so, I am trying to add
>>>>>>> a replacement node with same configuration.
>>>>>>>
>>>>>>> The problem is this new node also keeps dying with OOM - I've
>>>>>>> restarted the cassandra service like 8-10 times hoping that it would 
>>>>>>> finish
>>>>>>> the replication. But it didn't help.
>>>>>>>
>>>>>>> The one node that is still up is happily chugging along.
>>>>>>> All nodes have similar configuration - with libjna installed.
>>>>>>>
>>>>>>> Cassandra is installed from datastax's debian repo - pkg: dsc21
>>>>>>> version 2.1.7.
>>>>>>> I started off with the default configuration - i.e. the default
>>>>>>> cassandra-env.sh - which calculates the heap size automatically (1/4 * 
>>>>>>> RAM
>>>>>>> = 2GB)
>>>>>>>
>>>>>>> But, that didn't help. So, I then tried to increase the heap to 4GB
>>>>>>> manually and restarted. It still keeps crashing.
>>>>>>>
>>>>>>> Any clue as to why it's happening?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Kunal
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Cassandra OOM on joining existing ring

Reply via email to