I want to check in.  I'm sad, mad and afraid.  I've been trying to get a 1.2 
cluster up and working with my data set for three weeks with no success.  I've 
been running a 1.1 cluster for 8 months now with no hiccups, but for me at 
least 1.2 has been a disaster.  I had high hopes for leveraging the new 
features of 1.2, specifically vnodes and collections.   But at this point I 
can't release my system into production, and will probably need to find a new 
back end.  As a small startup, this could be catastrophic.  I'm mostly mad at 
myself.  I took a risk moving to the new tech.  I forgot sometimes when you 
gamble, you lose.

First, the performance of 1.2.6 was horrible when using collections.  I wasn't 
able to push through 500k rows before the cluster became unusable.  With a lot 
of digging, and way too much time, I discovered I was hitting a bug that had 
just been fixed, but was unreleased.  This scared me, because the release was 
already at 1.2.6 and I would have expected something as 
https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been addressed 
long before.  But gamely I grabbed the latest code from the 1.2 branch, built 
it and I was finally able to get past half a million rows.  

But, then I hit ~4 million rows, and a multitude of problems.  Even with the 
fix above, I was still seeing a ton of compactions failing, specifically the 
ones for large rows.  Not a single large row will compact, they all assert with 
the wrong size.  Worse, and this is what kills the whole thing, I keep hitting 
a wall with open files, even after dumping the whole DB, dropping vnodes and 
trying again.  Seriously, 650k open file descriptors?  When it hits this limit, 
the whole DB craps out and is basically unusable.  This isn't that many rows.  
I have close to a half a billion in 1.1…

I'm now at a standstill.  I figure I have two options unless someone here can 
help me.  Neither of them involve 1.2.  I can either go back to 1.1 and remove 
the features that collections added to my service, or I find another data 
backend that has similar performance characteristics to cassandra but allows 
collections type behavior in a scalable manner.  Cause as far as I can tell, 
1.2 doesn't scale.  Which makes me sad, I was proud of what I accomplished with 
1.1….

Does anyone know why there are so many open file descriptors?  Any ideas on why 
a large row won't compact?

Paul

Reply via email to