I am trying to minimize my SSTable count to help cut down my read latency.  I 
have some very beefy boxes for my cassandra nodes (96 gigs of memory each).  I 
think this gives me a lot of flexibility to cut down SSTable count by having a 
very large memtable throughput setting.

While experimenting with this, I found a bug where you can't have memtable 
throughput configured past 2 gigs without an integer overflow screwing up the 
flushes.  That makes me feel like I'm in uncharted territory :).  I'm guessing 
the standard answer to too many SSTables, is get more boxes, but I'm hoping I 
can squeeze a lot more juice out of the ones I have given the specs.

I had wanted to set the throughput to 8 gigs for my column family (I only have 
one) and set my heap to 30 gigs (still leaving 66 gigs for file cache).  When 
this failed due to the int overflow, I partitioned the column family into 4 
column families (simple mod operation on the row key when I save and retrieve) 
and set them each to 2 gigs throughput to replicate the same behavior.

I did some quick write stress tests and every things to perform well and 
stable.  Memory usage also seemed stable.  However, I am nervous since the 
defect makes me think most people are using much smaller memory loads.  Has 
anyone had any experience with having cassandra use this much memory?  Does 
anyone see any pitfalls that I'm missing?  If not, I'll let you guys know if I 
learn anything interesting!



      

Reply via email to