First I need to vent.
<rant>
One of my cassandra cluster is a dual data center setup, with DC1
acting as primary, and DC2 acting as a hot backup.
Well, guess what ? I am pretty sure that it falls behind on
replication. So I am told I need to run repair.
I run repair (with -pr) on DC2. First time I run it it gets *stuck*
(i.e. frozen) within the first 30 seconds, with no error or any sort of
message. I then run it again -- and it completes in seconds on each
node, with about 50 gigs of data on each.
That seems suspicious, so I do some research.
I am told on IRC that running repair -pr will only do the repair on
"100" tokens (the offset from DC1 to DC2)… Seriously ???
Repair process is, indeed, a joke:
https://issues.apache.org/jira/browse/CASSANDRA-5396 . Repair is the
worst thing you can do to your cluster, it consumes enormous resources,
and can leave your cluster in an inconsistent state. Oh and by the way
you must run it every week…. Whoever invented that process must not
live in a real world, with real applications.
</rant>
No… lets have a constructive conversation.
How do I know, with certainty, that my DC2 cluster is up to date on
replication ? I have a few options:
1) I set read repair chance to 100% on critical column families and I
write a tool to scan every CF, every column of every row. This strikes
me as very silly.
Q1: Do I need to scan every column or is looking at one column enough
to trigger a read repair ?
2) Can someone explain to me how the repair works such that I don't
totally trash my cluster or spill into work week ?
Is there any improvement and clarity in 1.2 ? How about 2.0 ?
--
Regards,
Oleg Dulin
http://www.olegdulin.com