good points Aaron. I realize now how expensive repair on reads are. I'm
going to keep doing repairs regularly but still have a max TTL on all
columns to make sure we don't have really old data we no longer need
getting buried in the cluster.
On , aaron morton <aa...@thelastpickle.com> wrote:
Read repair will only repair data that is read on the nodes that are up
at that time, and does not guarantee that any changes it detects will be
written back to the nodes. The diff mutations are async fire and forget
messages which may go missing or be dropped or ignored by the recipient
just like any other message.
Also getting hit with a bunch of read repair operations is pretty
painful. The normal read runs, the coordinator detects the digest
mis-match, the read runs again from all nodes and they all have to return
their full data (no digests this time), the coordinator detects the
diffs, mutations are sent back to each node that needs them. All this
happens sync to the read request when the CL > ONE. Thats 2 reads with
more network IO and up to RF mutations .
The delete thing is important but repair also reduces the chance of reads
getting hit with RR and gives me confidence when it's necessary to nuke a
bad node.
Your plan may work but it feels risky to me. You may end up with worse
read performance and unpleasent emotions if you ever have to nuke a node.
Others may disagree.
Not ignoring the fact the repair can take a long time, fail, hurt
performance etc. There are plans to improve it though.
Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 22 Jul 2011, at 19:55, jonathan.co...@gmail.com wrote:
> One of the main reasons for regularly running repair is to make sure
deletes are propagated in the cluster, ie, data is not resurrected if a
node never received the delete call.
>
> And repair-on-read takes care of repairing inconsistencies "on-the-fly".
>
> So if I were to set a universal TTL on all columns - so everything
would only live for a certain age, would I be able to get away without
having to do regular repairs with nodetool?
>
> I realize this scenario would not be applicable for everyone, but our
data model would allow us to do this.
>
> So could this be an alternative to running the (resource-intensive,
long-running) repairs with nodetool?
>
> Thanks.