Hi, Looks like that is my primary problem - the sstable count for the daily_challenges column family is >5k. Azure had scheduled maintenance window on Sat. All the VMs got rebooted one by one - including the current cassandra one - and it's taking forever to bring cassandra back up online.
Is there any way I can re-organize my existing data? so that I can bring down that count? I don't want to lose that data. If possible, can I do that while cassandra is down? As I mentioned, it's taking forever to get the service up - it's stuck in reading those 5k sstable (+ another 5k of corresponding secondary index) files. :( Oh, did I mention I'm new to cassandra? Thanks, Kunal Kunal On 11 July 2015 at 03:29, Sebastian Estevez <sebastian.este...@datastax.com> wrote: > #1 > >> There is one table - daily_challenges - which shows compacted partition >> max bytes as ~460M and another one - daily_guest_logins - which shows >> compacted partition max bytes as ~36M. > > > 460 is high, I like to keep my partitions under 100mb when possible. I've > seen worse though. The fix is to add something else (maybe month or week or > something) into your partition key: > > PRIMARY KEY ((segment_type, something_else), date, user_id, sess_id) > > #2 looks like your jam version is 3 per your env.sh so you're probably > okay to copy the env.sh over from the C* 3.0 link I shared once you > uncomment and tweak the MAX_HEAP. If there's something wrong your node > won't come up. tail your logs. > > > > All the best, > > > [image: datastax_logo.png] <http://www.datastax.com/> > > Sebastián Estévez > > Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com > > [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image: > facebook.png] <https://www.facebook.com/datastax> [image: twitter.png] > <https://twitter.com/datastax> [image: g+.png] > <https://plus.google.com/+Datastax/about> > <http://feeds.feedburner.com/datastax> > > <http://cassandrasummit-datastax.com/> > > DataStax is the fastest, most scalable distributed database technology, > delivering Apache Cassandra to the world’s most innovative enterprises. > Datastax is built to be agile, always-on, and predictably scalable to any > size. With more than 500 customers in 45 countries, DataStax is the > database technology and transactional backbone of choice for the worlds > most innovative companies such as Netflix, Adobe, Intuit, and eBay. > > On Fri, Jul 10, 2015 at 2:44 PM, Kunal Gangakhedkar < > kgangakhed...@gmail.com> wrote: > >> And here is my cassandra-env.sh >> https://gist.github.com/kunalg/2c092cb2450c62be9a20 >> >> Kunal >> >> On 11 July 2015 at 00:04, Kunal Gangakhedkar <kgangakhed...@gmail.com> >> wrote: >> >>> From jhat output, top 10 entries for "Instance Count for All Classes >>> (excluding platform)" shows: >>> >>> 2088223 instances of class org.apache.cassandra.db.BufferCell >>> 1983245 instances of class >>> org.apache.cassandra.db.composites.CompoundSparseCellName >>> 1885974 instances of class >>> org.apache.cassandra.db.composites.CompoundDenseCellName >>> 630000 instances of class >>> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo >>> 503687 instances of class org.apache.cassandra.db.BufferDeletedCell >>> 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier >>> 101800 instances of class org.apache.cassandra.utils.concurrent.Ref >>> 101800 instances of class >>> org.apache.cassandra.utils.concurrent.Ref$State >>> 90704 instances of class >>> org.apache.cassandra.utils.concurrent.Ref$GlobalState >>> 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey >>> >>> At the bottom of the page, it shows: >>> Total of 8739510 instances occupying 193607512 bytes. >>> JFYI. >>> >>> Kunal >>> >>> On 10 July 2015 at 23:49, Kunal Gangakhedkar <kgangakhed...@gmail.com> >>> wrote: >>> >>>> Thanks for quick reply. >>>> >>>> 1. I don't know what are the thresholds that I should look for. So, to >>>> save this back-and-forth, I'm attaching the cfstats output for the >>>> keyspace. >>>> >>>> There is one table - daily_challenges - which shows compacted partition >>>> max bytes as ~460M and another one - daily_guest_logins - which shows >>>> compacted partition max bytes as ~36M. >>>> >>>> Can that be a problem? >>>> Here is the CQL schema for the daily_challenges column family: >>>> >>>> CREATE TABLE app_10001.daily_challenges ( >>>> segment_type text, >>>> date timestamp, >>>> user_id int, >>>> sess_id text, >>>> data text, >>>> deleted boolean, >>>> PRIMARY KEY (segment_type, date, user_id, sess_id) >>>> ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC) >>>> AND bloom_filter_fp_chance = 0.01 >>>> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' >>>> AND comment = '' >>>> AND compaction = {'min_threshold': '4', 'class': >>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', >>>> 'max_threshold': '32'} >>>> AND compression = {'sstable_compression': >>>> 'org.apache.cassandra.io.compress.LZ4Compressor'} >>>> AND dclocal_read_repair_chance = 0.1 >>>> AND default_time_to_live = 0 >>>> AND gc_grace_seconds = 864000 >>>> AND max_index_interval = 2048 >>>> AND memtable_flush_period_in_ms = 0 >>>> AND min_index_interval = 128 >>>> AND read_repair_chance = 0.0 >>>> AND speculative_retry = '99.0PERCENTILE'; >>>> >>>> CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted); >>>> >>>> >>>> 2. I don't know - how do I check? As I mentioned, I just installed the >>>> dsc21 update from datastax's debian repo (ver 2.1.7). >>>> >>>> Really appreciate your help. >>>> >>>> Thanks, >>>> Kunal >>>> >>>> On 10 July 2015 at 23:33, Sebastian Estevez < >>>> sebastian.este...@datastax.com> wrote: >>>> >>>>> 1. You want to look at # of sstables in cfhistograms or in cfstats >>>>> look at: >>>>> Compacted partition maximum bytes >>>>> Maximum live cells per slice >>>>> >>>>> 2) No, here's the env.sh from 3.0 which should work with some tweaks: >>>>> >>>>> https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh >>>>> >>>>> You'll at least have to modify the jamm version to what's in yours. I >>>>> think it's 2.5 >>>>> >>>>> >>>>> >>>>> All the best, >>>>> >>>>> >>>>> [image: datastax_logo.png] <http://www.datastax.com/> >>>>> >>>>> Sebastián Estévez >>>>> >>>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com >>>>> >>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image: >>>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png] >>>>> <https://twitter.com/datastax> [image: g+.png] >>>>> <https://plus.google.com/+Datastax/about> >>>>> <http://feeds.feedburner.com/datastax> >>>>> >>>>> <http://cassandrasummit-datastax.com/> >>>>> >>>>> DataStax is the fastest, most scalable distributed database >>>>> technology, delivering Apache Cassandra to the world’s most innovative >>>>> enterprises. Datastax is built to be agile, always-on, and predictably >>>>> scalable to any size. With more than 500 customers in 45 countries, >>>>> DataStax >>>>> is the database technology and transactional backbone of choice for the >>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>>>> >>>>> On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar < >>>>> kgangakhed...@gmail.com> wrote: >>>>> >>>>>> Thanks, Sebastian. >>>>>> >>>>>> Couple of questions (I'm really new to cassandra): >>>>>> 1. How do I interpret the output of 'nodetool cfstats' to figure out >>>>>> the issues? Any documentation pointer on that would be helpful. >>>>>> >>>>>> 2. I'm primarily a python/c developer - so, totally clueless about >>>>>> JVM environment. So, please bare with me as I would need a lot of >>>>>> hand-holding. >>>>>> Should I just copy+paste the settings you gave and try to restart the >>>>>> failing cassandra server? >>>>>> >>>>>> Thanks, >>>>>> Kunal >>>>>> >>>>>> On 10 July 2015 at 22:35, Sebastian Estevez < >>>>>> sebastian.este...@datastax.com> wrote: >>>>>> >>>>>>> #1 You need more information. >>>>>>> >>>>>>> a) Take a look at your .hprof file (memory heap from the OOM) with >>>>>>> an introspection tool like jhat or visualvm or java flight recorder and >>>>>>> see >>>>>>> what is using up your RAM. >>>>>>> >>>>>>> b) How big are your large rows (use nodetool cfstats on each node). >>>>>>> If your data model is bad, you are going to have to re-design it no >>>>>>> matter >>>>>>> what. >>>>>>> >>>>>>> #2 As a possible workaround try using the G1GC allocator with the >>>>>>> settings from c* 3.0 instead of CMS. I've seen lots of success with it >>>>>>> lately (tl;dr G1GC is much simpler than CMS and almost as good as a >>>>>>> finely >>>>>>> tuned CMS). *Note:* Use it with the latest Java 8 from Oracle. Do >>>>>>> *not* set the newgen size for G1 sets it dynamically: >>>>>>> >>>>>>> # min and max heap sizes should be set to the same value to avoid >>>>>>>> # stop-the-world GC pauses during resize, and so that we can lock >>>>>>>> the >>>>>>>> # heap in memory on startup to prevent any of it from being swapped >>>>>>>> # out. >>>>>>>> JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}" >>>>>>>> JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}" >>>>>>>> >>>>>>>> # Per-thread stack size. >>>>>>>> JVM_OPTS="$JVM_OPTS -Xss256k" >>>>>>>> >>>>>>>> # Use the Hotspot garbage-first collector. >>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC" >>>>>>>> >>>>>>>> # Have the JVM do less remembered set work during STW, instead >>>>>>>> # preferring concurrent GC. Reduces p99.9 latency. >>>>>>>> JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5" >>>>>>>> >>>>>>>> # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC. >>>>>>>> # Machines with > 10 cores may need additional threads. >>>>>>>> # Increase to <= full cores (do not count HT cores). >>>>>>>> #JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16" >>>>>>>> #JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16" >>>>>>>> >>>>>>>> # Main G1GC tunable: lowering the pause target will lower >>>>>>>> throughput and vise versa. >>>>>>>> # 200ms is the JVM default and lowest viable setting >>>>>>>> # 1000ms increases throughput. Keep it smaller than the timeouts in >>>>>>>> cassandra.yaml. >>>>>>>> JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500" >>>>>>>> # Do reference processing in parallel GC. >>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled" >>>>>>>> >>>>>>>> # This may help eliminate STW. >>>>>>>> # The default in Hotspot 8u40 is 40%. >>>>>>>> #JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25" >>>>>>>> >>>>>>>> # For workloads that do large allocations, increasing the region >>>>>>>> # size may make things more efficient. Otherwise, let the JVM >>>>>>>> # set this automatically. >>>>>>>> #JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=32m" >>>>>>>> >>>>>>>> # Make sure all memory is faulted and zeroed on startup. >>>>>>>> # This helps prevent soft faults in containers and makes >>>>>>>> # transparent hugepage allocation more effective. >>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch" >>>>>>>> >>>>>>>> # Biased locking does not benefit Cassandra. >>>>>>>> JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking" >>>>>>>> >>>>>>>> # Larger interned string table, for gossip's benefit >>>>>>>> (CASSANDRA-6410) >>>>>>>> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=1000003" >>>>>>>> >>>>>>>> # Enable thread-local allocation blocks and allow the JVM to >>>>>>>> automatically >>>>>>>> # resize them at runtime. >>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB" >>>>>>>> >>>>>>>> # http://www.evanjones.ca/jvm-mmap-pause.html >>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem" >>>>>>> >>>>>>> >>>>>>> All the best, >>>>>>> >>>>>>> >>>>>>> [image: datastax_logo.png] <http://www.datastax.com/> >>>>>>> >>>>>>> Sebastián Estévez >>>>>>> >>>>>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com >>>>>>> >>>>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> >>>>>>> [image: >>>>>>> facebook.png] <https://www.facebook.com/datastax> [image: >>>>>>> twitter.png] <https://twitter.com/datastax> [image: g+.png] >>>>>>> <https://plus.google.com/+Datastax/about> >>>>>>> <http://feeds.feedburner.com/datastax> >>>>>>> >>>>>>> <http://cassandrasummit-datastax.com/> >>>>>>> >>>>>>> DataStax is the fastest, most scalable distributed database >>>>>>> technology, delivering Apache Cassandra to the world’s most innovative >>>>>>> enterprises. Datastax is built to be agile, always-on, and predictably >>>>>>> scalable to any size. With more than 500 customers in 45 countries, >>>>>>> DataStax >>>>>>> is the database technology and transactional backbone of choice for the >>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and >>>>>>> eBay. >>>>>>> >>>>>>> On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar < >>>>>>> kgangakhed...@gmail.com> wrote: >>>>>>> >>>>>>>> I upgraded my instance from 8GB to a 14GB one. >>>>>>>> Allocated 8GB to jvm heap in cassandra-env.sh. >>>>>>>> >>>>>>>> And now, it crashes even faster with an OOM.. >>>>>>>> >>>>>>>> Earlier, with 4GB heap, I could go upto ~90% replication completion >>>>>>>> (as reported by nodetool netstats); now, with 8GB heap, I cannot even >>>>>>>> get >>>>>>>> there. I've already restarted cassandra service 4 times with 8GB heap. >>>>>>>> >>>>>>>> No clue what's going on.. :( >>>>>>>> >>>>>>>> Kunal >>>>>>>> >>>>>>>> On 10 July 2015 at 17:45, Jack Krupansky <jack.krupan...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> You, and only you, are responsible for knowing your data and data >>>>>>>>> model. >>>>>>>>> >>>>>>>>> If columns per row or rows per partition can be large, then an 8GB >>>>>>>>> system is probably too small. But the real issue is that you need to >>>>>>>>> keep >>>>>>>>> your partition size from getting too large. >>>>>>>>> >>>>>>>>> Generally, an 8GB system is okay, but only for reasonably-sized >>>>>>>>> partitions, like under 10MB. >>>>>>>>> >>>>>>>>> >>>>>>>>> -- Jack Krupansky >>>>>>>>> >>>>>>>>> On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar < >>>>>>>>> kgangakhed...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> I'm new to cassandra >>>>>>>>>> How do I find those out? - mainly, the partition params that you >>>>>>>>>> asked for. Others, I think I can figure out. >>>>>>>>>> >>>>>>>>>> We don't have any large objects/blobs in the column values - it's >>>>>>>>>> all textual, date-time, numeric and uuid data. >>>>>>>>>> >>>>>>>>>> We use cassandra to primarily store segmentation data - with >>>>>>>>>> segment type as partition key. That is again divided into two >>>>>>>>>> separate >>>>>>>>>> column families; but they have similar structure. >>>>>>>>>> >>>>>>>>>> Columns per row can be fairly large - each segment type as the >>>>>>>>>> row key and associated user ids and timestamp as column value. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Kunal >>>>>>>>>> >>>>>>>>>> On 10 July 2015 at 16:36, Jack Krupansky < >>>>>>>>>> jack.krupan...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> What does your data and data model look like - partition size, >>>>>>>>>>> rows per partition, number of columns per row, any large >>>>>>>>>>> values/blobs in >>>>>>>>>>> column values? >>>>>>>>>>> >>>>>>>>>>> You could run fine on an 8GB system, but only if your rows and >>>>>>>>>>> partitions are reasonably small. Any large partitions could blow >>>>>>>>>>> you away. >>>>>>>>>>> >>>>>>>>>>> -- Jack Krupansky >>>>>>>>>>> >>>>>>>>>>> On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar < >>>>>>>>>>> kgangakhed...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Attaching the stack dump captured from the last OOM. >>>>>>>>>>>> >>>>>>>>>>>> Kunal >>>>>>>>>>>> >>>>>>>>>>>> On 10 July 2015 at 13:32, Kunal Gangakhedkar < >>>>>>>>>>>> kgangakhed...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Forgot to mention: the data size is not that big - it's barely >>>>>>>>>>>>> 10GB in all. >>>>>>>>>>>>> >>>>>>>>>>>>> Kunal >>>>>>>>>>>>> >>>>>>>>>>>>> On 10 July 2015 at 13:29, Kunal Gangakhedkar < >>>>>>>>>>>>> kgangakhed...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have a 2 node setup on Azure (east us region) running >>>>>>>>>>>>>> Ubuntu server 14.04LTS. >>>>>>>>>>>>>> Both nodes have 8GB RAM. >>>>>>>>>>>>>> >>>>>>>>>>>>>> One of the nodes (seed node) died with OOM - so, I am trying >>>>>>>>>>>>>> to add a replacement node with same configuration. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The problem is this new node also keeps dying with OOM - I've >>>>>>>>>>>>>> restarted the cassandra service like 8-10 times hoping that it >>>>>>>>>>>>>> would finish >>>>>>>>>>>>>> the replication. But it didn't help. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The one node that is still up is happily chugging along. >>>>>>>>>>>>>> All nodes have similar configuration - with libjna installed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cassandra is installed from datastax's debian repo - pkg: >>>>>>>>>>>>>> dsc21 version 2.1.7. >>>>>>>>>>>>>> I started off with the default configuration - i.e. the >>>>>>>>>>>>>> default cassandra-env.sh - which calculates the heap size >>>>>>>>>>>>>> automatically >>>>>>>>>>>>>> (1/4 * RAM = 2GB) >>>>>>>>>>>>>> >>>>>>>>>>>>>> But, that didn't help. So, I then tried to increase the heap >>>>>>>>>>>>>> to 4GB manually and restarted. It still keeps crashing. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Any clue as to why it's happening? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Kunal >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >