I have a 30+ node cluster that is under heavy read and write load.  Based on 
the fact that we never delete data, and all data is inserted with TTLs and is 
somewhat temporal if not upserted, and we are fine with the consistency of one 
and read repair chance, we elected to never repair.  The reasoning behind this 
is that the data is so temporal and would simply vanish through normal 
compaction.  We also adhere to the policy of trying to do full row writes so we 
do not have to do reassembly during reads.  Are there any consequences we 
should be aware of with this strategy?  We don’t even run repair when adding 
nodes to the cluster—we just wait for the data to invalidate itself via TTL and 
be compacted away.

Based on everything I’ve read, running repair only really helps us on 
consistency (which we don’t care about because data is updated so often that 
being one update behind is fine) and deleted data re-appearing (and we never 
delete, we just always use TTLs).  Perhaps there is some other reason to run 
repair that we are not aware of?

Wayne

Reply via email to