Re: Scaling problems

2010-05-23 Thread Jonathan Ellis
No, it's really not designed to be a "leave the nodes down while I do a ton of inserts." (a) HH schema creates a column per hinted row, so you'll hit the 2GB row limit sooner or later (b) it goes through the hints hourly in case it missed a gossip Up notification On Sat, May 22, 2010 at 9:07 PM,

Re: Scaling problems

2010-05-22 Thread Ian Soboroff
I'll try this. HH backs up because nodes are failing. I haven't read the code, but why should HH suck CPU? As I understand it, there's nothing to hand off until the destination comes back up, and Gossip should tell us that, no? In the interim, it's just a cache of writes waiting to be sent. Is

Re: Scaling problems

2010-05-21 Thread Jonathan Ellis
On Fri, May 21, 2010 at 9:09 AM, Ian Soboroff wrote: > HINTED-HANDOFF-POOL   1   158 23 this is your smoking gun. HH tasks suck a ton of CPU and you have 158 backed up. i would just blow the HH files away from your data/system directory, restart the node, and run rep

Re: Scaling problems

2010-05-21 Thread Ian Soboroff
Ok, spending some time slogging through the cassandra-user archives. Seems lots of folks have this problem. Starting with a JVM upgrade, then skimming through JIRA looking for patches. Ian On Fri, May 21, 2010 at 12:09 PM, Ian Soboroff wrote: > So at the moment, I'm not running my loader, and

Re: Scaling problems

2010-05-21 Thread Ian Soboroff
So at the moment, I'm not running my loader, and I'm looking at one node which is slow to respond to nodetool requests. At this point, it has a pile of hinted-handoffs pending which don't seem to be draining out. The system.log shows that it's GCing pretty much constantly. Ian $ /usr/local/src/

Re: Scaling problems

2010-05-21 Thread Ian Soboroff
On the to-do list for today. Is there a tool to aggregate all the JMX stats from all nodes? I mean, something a little more complete than nagios. Ian On Fri, May 21, 2010 at 10:23 AM, Jonathan Ellis wrote: > you should check the jmx stages I posted about > > On Fri, May 21, 2010 at 7:05 AM, I

Re: Scaling problems

2010-05-21 Thread Jonathan Ellis
you should check the jmx stages I posted about On Fri, May 21, 2010 at 7:05 AM, Ian Soboroff wrote: > Just an update.  I rolled the memtable size back to 128MB.  I am still > seeing that the daemon runs for a while with reasonable heap usage, but then > the heap climbs up to the max (6GB in this

Re: Scaling problems

2010-05-21 Thread Ian Soboroff
Just an update. I rolled the memtable size back to 128MB. I am still seeing that the daemon runs for a while with reasonable heap usage, but then the heap climbs up to the max (6GB in this case, should be plenty) and it starts GCing, without much getting cleared. The client catches lots of excep

Re: Scaling problems

2010-05-20 Thread Ian Soboroff
Excellent leads, thanks. cassandra.in.sh has a heap of 6GB, but I didn't realize that I was trying to float so many memtables. I'll poke tomorrow and report if it gets fixed. Ian On Thu, May 20, 2010 at 10:40 AM, Jonathan Ellis wrote: > Some possibilities: > > You didn't adjust Cassandra heap

Re: Scaling problems

2010-05-20 Thread Jonathan Ellis
Some possibilities: You didn't adjust Cassandra heap size in cassandra.in.sh (1GB is too small) You're inserting at CL.ZERO (ROW-MUTATION-STAGE in tpstats will show large pending ops -- large = 100s) You're creating large rows a bit at a time and Cassandra OOMs when it tries to compact (the oom sh

Scaling problems

2010-05-18 Thread Ian Soboroff
I hope this isn't too much of a newbie question. I am using Cassandra 0.6.1 on a small cluster of Linux boxes - 14 nodes, each with 8GB RAM and 5 data drives. The nodes are running HDFS to serve files within the cluster, but at the moment the rest of Hadoop is shut down. I'm trying to load a lar