Hello, I saw this earlier yesterday but didn't want to reply because I didn't know what the cause was.
Basically I using wide rows with cassandra 1.x and was inserting data constantly. After about 18 hours the JVM would crash with a dump file. For some reason I removed the compaction throttling and the problem disappeared. I've never really found out what the root cause was. On Thu Dec 04 2014 at 2:49:57 AM Gianluca Borello <gianl...@draios.com> wrote: > Thanks Robert, I really appreciate your help! > > I'm still unsure why Cassandra 2.1 seem to perform much better in that > same scenario (even setting the same values of compaction threshold and > number of compactors), but I guess we'll revise when we'll decide to > upgrade 2.1 in production. > > On Dec 3, 2014 6:33 PM, "Robert Coli" <rc...@eventbrite.com> wrote: > > > > On Tue, Dec 2, 2014 at 5:01 PM, Gianluca Borello <gianl...@draios.com> > wrote: > >> > >> We mainly store time series-like data, where each data point is a > binary blob of 5-20KB. We use wide rows, and try to put in the same row all > the data that we usually need in a single query (but not more than that). > As a result, our application logic is very simple (since we have to do just > one query to read the data on average) and read/write response times are > very satisfactory. This is a cfhistograms and a cfstats of our heaviest CF: > > > > > > 100mb is not HYOOOGE but is around the size where large rows can cause > heap pressure. > > > > You seem to be unclear on the implications of pending compactions, > however. > > > > Briefly, pending compactions indicate that you have more SSTables than > you "should". As compaction both merges row versions and reduces the number > of SSTables, a high number of pending compactions causes problems > associated with both having too many row versions ("fragmentation") and a > large number of SSTables (per-SSTable heap/memory (depending on version) > overhead like bloom filters and index samples). In your case, it seems the > problem is probably just the compaction throttle being too low. > > > > My conjecture is that, given your normal data size and read/write > workload, you are relatively close to "GC pre-fail" when compaction is > working. When it stops working, you relatively quickly get into a state > where you exhaust heap because you have too many SSTables. > > > > =Rob > > http://twitter.com/rcolidba > > PS - Given 30GB of RAM on the machine, you could consider investigating > "large-heap" configurations, rbranson from Instagram has some slides out > there on the topic. What you pay is longer stop the world GCs, IOW latency > if you happen to be talking to a replica node when it pauses. > > >