We faced similar issue where we had 60k sstables due to coldness bug in 2.0.3. 
We solved it by following Datastax recommendation for Production at 
http://docs.datastax.com/en/cassandra/1.2/cassandra/install/installRecommendSettings.html
 :


Step 1 : Add the following line to /etc/sysctl.conf :

 

vm.max_map_count = 131072

 

Step 2: To make the changes take effect, reboot the server or run the following 
command:

 

$ sudo sysctl -p

 

Step 3(optional): To confirm the limits are applied to the Cassandra process, 
run the following command where pid is the process ID of the currently running 
Cassandra process:

 

$ cat /proc/<pid>/limits



You can try above settings and share your results..


Thanks

Anuj

Sent from Yahoo Mail on Android

From:"Sebastian Estevez" <sebastian.este...@datastax.com>
Date:Mon, 13 Jul, 2015 at 7:02 pm
Subject:Re: Cassandra OOM on joining existing ring

Are you on the azure premium storage?
http://www.datastax.com/2015/04/getting-started-with-azure-premium-storage-and-datastax-enterprise-dse

Secondary indexes are built for convenience not performance.
http://www.datastax.com/resources/data-modeling

What's your compaction strategy? Your nodes have to come up in order for them 
to start compacting. 

On Jul 13, 2015 1:11 AM, "Kunal Gangakhedkar" <kgangakhed...@gmail.com> wrote:

Hi,


Looks like that is my primary problem - the sstable count for the 
daily_challenges column family is >5k. Azure had scheduled maintenance window 
on Sat. All the VMs got rebooted one by one - including the current cassandra 
one - and it's taking forever to bring cassandra back up online.


Is there any way I can re-organize my existing data? so that I can bring down 
that count?

I don't want to lose that data.

If possible, can I do that while cassandra is down? As I mentioned, it's taking 
forever to get the service up - it's stuck in reading those 5k sstable (+ 
another 5k of corresponding secondary index) files. :(

Oh, did I mention I'm new to cassandra?


Thanks,

Kunal


Kunal


On 11 July 2015 at 03:29, Sebastian Estevez <sebastian.este...@datastax.com> 
wrote:

#1 

There is one table - daily_challenges - which shows compacted partition max 
bytes as ~460M and another one - daily_guest_logins - which shows compacted 
partition max bytes as ~36M.


460 is high, I like to keep my partitions under 100mb when possible. I've seen 
worse though. The fix is to add something else (maybe month or week or 
something) into your partition key:


 PRIMARY KEY ((segment_type, something_else), date, user_id, sess_id)


#2 looks like your jam version is 3 per your env.sh so you're probably okay to 
copy the env.sh over from the C* 3.0 link I shared once you uncomment and tweak 
the MAX_HEAP. If there's something wrong your node won't come up. tail your 
logs.




All the best,




Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

    





DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any size. 
With more than 500 customers in 45 countries, DataStax is the database 
technology and transactional backbone of choice for the worlds most innovative 
companies such as Netflix, Adobe, Intuit, and eBay. 


On Fri, Jul 10, 2015 at 2:44 PM, Kunal Gangakhedkar <kgangakhed...@gmail.com> 
wrote:

And here is my cassandra-env.sh

https://gist.github.com/kunalg/2c092cb2450c62be9a20


Kunal


On 11 July 2015 at 00:04, Kunal Gangakhedkar <kgangakhed...@gmail.com> wrote:

From jhat output, top 10 entries for "Instance Count for All Classes (excluding 
platform)" shows:

2088223 instances of class org.apache.cassandra.db.BufferCell 
1983245 instances of class 
org.apache.cassandra.db.composites.CompoundSparseCellName 
1885974 instances of class 
org.apache.cassandra.db.composites.CompoundDenseCellName 
630000 instances of class org.apache.cassandra.io.sstable.IndexHelper$IndexInfo 
503687 instances of class org.apache.cassandra.db.BufferDeletedCell 
378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier 
101800 instances of class org.apache.cassandra.utils.concurrent.Ref 
101800 instances of class org.apache.cassandra.utils.concurrent.Ref$State 

90704 instances of class org.apache.cassandra.utils.concurrent.Ref$GlobalState 
71123 instances of class org.apache.cassandra.db.BufferDecoratedKey 


At the bottom of the page, it shows: 

Total of 8739510 instances occupying 193607512 bytes.

JFYI.


Kunal


On 10 July 2015 at 23:49, Kunal Gangakhedkar <kgangakhed...@gmail.com> wrote:

Thanks for quick reply.

1. I don't know what are the thresholds that I should look for. So, to save 
this back-and-forth, I'm attaching the cfstats output for the keyspace.

There is one table - daily_challenges - which shows compacted partition max 
bytes as ~460M and another one - daily_guest_logins - which shows compacted 
partition max bytes as ~36M.

Can that be a problem? 

Here is the CQL schema for the daily_challenges column family:

CREATE TABLE app_10001.daily_challenges (
    segment_type text,
    date timestamp,
    user_id int,
    sess_id text,
    data text,
    deleted boolean,
    PRIMARY KEY (segment_type, date, user_id, sess_id)
) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32'}
    AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted);


2. I don't know - how do I check? As I mentioned, I just installed the dsc21 
update from datastax's debian repo (ver 2.1.7).

Really appreciate your help.


Thanks,

Kunal


On 10 July 2015 at 23:33, Sebastian Estevez <sebastian.este...@datastax.com> 
wrote:

1. You want to look at # of sstables in cfhistograms or in cfstats look at:

Compacted partition maximum bytes

Maximum live cells per slice


2) No, here's the env.sh from 3.0 which should work with some tweaks:

https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh


You'll at least have to modify the jamm version to what's in yours. I think 
it's 2.5




All the best,




Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

    





DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any size. 
With more than 500 customers in 45 countries, DataStax is the database 
technology and transactional backbone of choice for the worlds most innovative 
companies such as Netflix, Adobe, Intuit, and eBay. 


On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar <kgangakhed...@gmail.com> 
wrote:

Thanks, Sebastian.

Couple of questions (I'm really new to cassandra):

1. How do I interpret the output of 'nodetool cfstats' to figure out the 
issues? Any documentation pointer on that would be helpful.

2. I'm primarily a python/c developer - so, totally clueless about JVM 
environment. So, please bare with me as I would need a lot of hand-holding.
Should I just copy+paste the settings you gave and try to restart the failing 
cassandra server?


Thanks,

Kunal


On 10 July 2015 at 22:35, Sebastian Estevez <sebastian.este...@datastax.com> 
wrote:

#1 You need more information. 


a) Take a look at your .hprof file (memory heap from the OOM) with an 
introspection tool like jhat or visualvm or java flight recorder and see what 
is using up your RAM.


b) How big are your large rows (use nodetool cfstats on each node). If your 
data model is bad, you are going to have to re-design it no matter what.


#2 As a possible workaround try using the G1GC allocator with the settings from 
c* 3.0 instead of CMS. I've seen lots of success with it lately (tl;dr G1GC is 
much simpler than CMS and almost as good as a finely tuned CMS). Note: Use it 
with the latest Java 8 from Oracle. Do not set the newgen size for G1 sets it 
dynamically:


# min and max heap sizes should be set to the same value to avoid
# stop-the-world GC pauses during resize, and so that we can lock the
# heap in memory on startup to prevent any of it from being swapped
# out.
JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
 
# Per-thread stack size.
JVM_OPTS="$JVM_OPTS -Xss256k"
 
# Use the Hotspot garbage-first collector.
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
 
# Have the JVM do less remembered set work during STW, instead
# preferring concurrent GC. Reduces p99.9 latency.
JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
 
# The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC.
# Machines with > 10 cores may need additional threads.
# Increase to <= full cores (do not count HT cores).
#JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16"
#JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16"
 
# Main G1GC tunable: lowering the pause target will lower throughput and vise 
versa.
# 200ms is the JVM default and lowest viable setting
# 1000ms increases throughput. Keep it smaller than the timeouts in 
cassandra.yaml.
JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
# Do reference processing in parallel GC.
JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"
 
# This may help eliminate STW.
# The default in Hotspot 8u40 is 40%.
#JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
 
# For workloads that do large allocations, increasing the region
# size may make things more efficient. Otherwise, let the JVM
# set this automatically.
#JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=32m"
 
# Make sure all memory is faulted and zeroed on startup.
# This helps prevent soft faults in containers and makes
# transparent hugepage allocation more effective.
JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch"
 
# Biased locking does not benefit Cassandra.
JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
 
# Larger interned string table, for gossip's benefit (CASSANDRA-6410)
JVM_OPTS="$JVM_OPTS -XX:StringTableSize=1000003"
 
# Enable thread-local allocation blocks and allow the JVM to automatically
# resize them at runtime.
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB"
 
# http://www.evanjones.ca/jvm-mmap-pause.html
JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"


All the best,




Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

    





DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any size. 
With more than 500 customers in 45 countries, DataStax is the database 
technology and transactional backbone of choice for the worlds most innovative 
companies such as Netflix, Adobe, Intuit, and eBay. 


On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar <kgangakhed...@gmail.com> 
wrote:

I upgraded my instance from 8GB to a 14GB one.

Allocated 8GB to jvm heap in cassandra-env.sh.

And now, it crashes even faster with an OOM..

Earlier, with 4GB heap, I could go upto ~90% replication completion (as 
reported by nodetool netstats); now, with 8GB heap, I cannot even get there. 
I've already restarted cassandra service 4 times with 8GB heap.


No clue what's going on.. :(


Kunal


On 10 July 2015 at 17:45, Jack Krupansky <jack.krupan...@gmail.com> wrote:

You, and only you, are responsible for knowing your data and data model.


If columns per row or rows per partition can be large, then an 8GB system is 
probably too small. But the real issue is that you need to keep your partition 
size from getting too large.


Generally, an 8GB system is okay, but only for reasonably-sized partitions, 
like under 10MB.



-- Jack Krupansky


On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar <kgangakhed...@gmail.com> 
wrote:

I'm new to cassandra
How do I find those out? - mainly, the partition params that you asked for. 
Others, I think I can figure out.


We don't have any large objects/blobs in the column values - it's all textual, 
date-time, numeric and uuid data.


We use cassandra to primarily store segmentation data - with segment type as 
partition key. That is again divided into two separate column families; but 
they have similar structure.

Columns per row can be fairly large - each segment type as the row key and 
associated user ids and timestamp as column value.

Thanks,

Kunal


On 10 July 2015 at 16:36, Jack Krupansky <jack.krupan...@gmail.com> wrote:

What does your data and data model look like - partition size, rows per 
partition, number of columns per row, any large values/blobs in column values?


You could run fine on an 8GB system, but only if your rows and partitions are 
reasonably small. Any large partitions could blow you away.


-- Jack Krupansky


On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar <kgangakhed...@gmail.com> 
wrote:

Attaching the stack dump captured from the last OOM.


Kunal


On 10 July 2015 at 13:32, Kunal Gangakhedkar <kgangakhed...@gmail.com> wrote:

Forgot to mention: the data size is not that big - it's barely 10GB in all.


Kunal


On 10 July 2015 at 13:29, Kunal Gangakhedkar <kgangakhed...@gmail.com> wrote:

Hi,

I have a 2 node setup on Azure (east us region) running Ubuntu server 14.04LTS.

Both nodes have 8GB RAM.

One of the nodes (seed node) died with OOM - so, I am trying to add a 
replacement node with same configuration.

The problem is this new node also keeps dying with OOM - I've restarted the 
cassandra service like 8-10 times hoping that it would finish the replication. 
But it didn't help.

The one node that is still up is happily chugging along.

All nodes have similar configuration - with libjna installed.

Cassandra is installed from datastax's debian repo - pkg: dsc21 version 2.1.7.

I started off with the default configuration - i.e. the default 
cassandra-env.sh - which calculates the heap size automatically (1/4 * RAM = 
2GB)

But, that didn't help. So, I then tried to increase the heap to 4GB manually 
and restarted. It still keeps crashing.

Any clue as to why it's happening?


Thanks,

Kunal















Reply via email to