Re: repair takes two days, and ends up stuck: stream at 1096% (yes, really)

2010-11-14 Thread Chip Salzenberg
; > On Sun, Nov 14, 2010 at 3:49 PM, Chip Salzenberg > wrote: > > My by-now infamous eight-node cluster running 0.7.0beta3+ dropped many > > replication MUTATEs during load, so I decided to fix replication copies > with > > a "nodetool repair" on one of the n

repair takes two days, and ends up stuck: stream at 1096% (yes, really)

2010-11-14 Thread Chip Salzenberg
My by-now infamous eight-node cluster running 0.7.0beta3+ dropped many replication MUTATEs during load, so I decided to fix replication copies with a "nodetool repair" on one of the nodes (X.21). The repair has been running for two days, and has finally gotten itself wedged into a state where it c

Gossip yoyo under write load

2010-11-12 Thread Chip Salzenberg
After I rebooted my 0.7.0beta3+ cluster to increase threads (read=100 write=200 ... they're beefy machines), and putting them under load again, I find gossip reporting yoyo up-down-up-down status for the other nodes. Anyone know what this is a symptom of, and/or how to avoid it? I haven't seen an

Re: node won't leave

2010-11-08 Thread Chip Salzenberg
On Sun, Nov 7, 2010 at 11:58 PM, Reverend Chip wrote: > Is there an existing tool to just read everything from every node, just > to force a read repair on everything? > "nodetool repair", of course. me-- for getting FAQ and mailing list out of order.

node won't leave

2010-11-05 Thread Chip Salzenberg
In the below "nodetool ring" output, machine 18 was told to loadbalance over an hour ago. It won't actually leave the ring. When I first told it to loadbalance, the cluster was under heavy write load; I've turned off the write load, but the node won't actually leave, still. Help? (It also colle