Hi all

we're still on 0.6 and are facing problems with repairs. 

I.e. a repair for one CF takes around 60h and we have to do that twice (RF=3, 5 
nodes). During that time the cluster is under pretty heavy IO load. It kinda 
works but during peek times we see lots of dropped messages (including writes). 
So we are actually creating inconsistencies that we are trying to fix with the 
repair.

Since we already have a very simple hadoopish framework in place which allows 
us to do token range walks with multiple workers and restart at a given 
position in case of failure I created a simple worker that would read 
everything with CL_ALL. With only one worker and almost no performance impact 
one scan took 7h.

My understanding is that at that point due to read repair I got the same as I 
would have achieved with repair runs.

Is that true or am I missing something?

Cheers,
Daniel

Reply via email to