Re: Repair taking a long, long time

Maxim Potekhin Tue, 19 Jul 2011 14:19:12 -0700

Thanks Edward. I'm told by our IT that the switch connecting the nodesis pretty fast.

Seriously, in my house I copy complete DVD images from my bedroom to
the living room downstairs via WiFi, and a dozen of GB does not seem like a
problem, on dirt cheap hardware (Patriot Box Office).

I also have just _one_ column major family but caveat emptor -- 8indexes attached toit (and there will be more). There is one accounting CF which is small,can't possibly

make a difference.

By contrast, compaction (as in nodetool) performs quite well on thiscluster. I start suspecting some

sort of malfunction.

Looked at the system log during the "repair", there is some compactionagent doingwork that I'm not sure makes sense (and I didn't call for it). Diskutilization all of a sudden goes up to 40%per Ganglia, and stays there, this is pretty silly considering thecluster is IDLE and we have SSDs. No external writes,

no reads. There are occasional GC stoppages, but these I can live with.

This repair debacle happens 2nd time in a row. Cr@p. I need to go toproduction soonand that doesn't look good at all. If I can't manage a system thatsimple (and/or get help

on this list) I may have to cut losses i.e. stay with Oracle.

Regards,

Maxim



On 7/19/2011 12:16 PM, Edward Capriolo wrote:

Well most SSD's are pretty fast. There is one more to consider. IfCassandra determines nodes are out of sync it has to transfer dataacross the network. If that is the case you have to look at 'nodetoolstreams' and determine how much data is being transferred betweennodes. There are some open tickets where with larger tables repair isstreaming more then it needs to. But even if the transfers are only10% of your 200GB. Transferring 20 GB is not trivial.
If you have multiple keyspaces and column families repair one at atime might make the process more manageable.

Re: Repair taking a long, long time

Reply via email to