Thanks for the explanation, Alain - very helpful!

From: Alain RODRIGUEZ <arodr...@gmail.com>
Date: Thursday, April 27, 2017 at 6:12 AM
To: <user@cassandra.apache.org>
Subject: Re: Why are automatic anti-entropy repairs required when hinted 
hand-off is enabled?

It happened to me in the future in a bad way, and nothing prevent it from 
happening in the future

Obviously "It happened to me in the past in a bad way"*. Thinking faster than I 
write... I am quite slow writing :p.

To be clear I recommend:

  *   to run repairs within gc_grace_seconds when performing deletes (not TTL, 
TTLs are fine)
  *   to run repairs 'regularly' when not deleting data (depending on data size 
and CL in use)
Hope that helps,
-----------------------
Alain Rodriguez - @arodream - 
al...@thelastpickle.com<mailto:al...@thelastpickle.com>
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2017-04-27 13:07 GMT+02:00 Alain RODRIGUEZ 
<arodr...@gmail.com<mailto:arodr...@gmail.com>>:
Hi,

To put it easy, I have been taught that anything that can be disabled is an 
optimization. So we don't want to rely an optimization that can silently fail. 
This goes for read repair as well as we cannot be sure that all the data will 
be read. Plus it is configured to trigger only 10 % of the time by default, and 
not cross data center.

(Anti-entropy) Repairs are known to be be necessary to make sure data is 
correctly distributed on all the nodes that are supposed to have it.

As Cassandra is built to allow native tolerance to failure (when correctly 
configured to do so), it can happen that a node miss a data, by design.

When this data that miss a node was a tombstone due to a delete, it needs to be 
replicated before all the other nodes remove it, which happen eventually after 
'gc_grace_seconds' (detailed post about this 
thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html<http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html>).
 If this tombstone is removed from all the nodes before having been replicated 
to the node that missed it, this node will eventually replicate the data that 
should have been deleted, the data overridden by the tombstone. We call it a 
zombie.

And hinted handoff can and will fail. It happened to me in the future in a bad 
way, and nothing prevent it from happening in the future, even if they were 
greatly imrpoved in 3.0+.

From Datastax 
doc<https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesHintedHandoff.html>:
 "Hints are flushed to disk every 10 seconds, reducing the staleness of the 
hints."

Which means that, by design, a node going down can lose up to 10 seconds of 
hints stored for other nodes (in which some might be deletes).

The conclusion is often the same one, if not running deletes or if zombie data 
is not an issue, it is quite safe not to run repair within 'gc_grace_seconds' 
(default 10 days). But this is the only way to ensure a low entropy for regular 
data (not only tombstones) in a Cassandra cluster as of now all other 
optimizations can and will fail at some point. It also provides a better 
consistency, if reading with a weak consistency level such as LOCAL_ONE, as it 
will reduce entropy, chance to read the same data everywhere increases.

C*heers,
-----------------------
Alain Rodriguez - @arodream - 
al...@thelastpickle.com<mailto:al...@thelastpickle.com>
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2017-04-21 15:54 GMT+02:00 Thakrar, Jayesh 
<jthak...@conversantmedia.com<mailto:jthak...@conversantmedia.com>>:

Unfortunately, I don’t know much about the replication architecture.
The only thing I know is that the replication is set at the keyspace level 
(i.e. 1, 2 or 3 or N replicas) and then
there is the consistency level set at the client application level which 
determines how many acknowledgements
are necessary to deem a write successful.

And you might have noticed in the video that anti-entropy is to be done as 
"deemed" necessary and not to be done blindly as a rule.
E.g. if your data is read-only (never mutated) then there is no need for 
anti-entropy.

From: eugene miretsky 
<eugene.miret...@gmail.com<mailto:eugene.miret...@gmail.com>>
Date: Thursday, April 20, 2017 at 5:52 PM
To: Conversant 
<jthak...@conversantmedia.com<mailto:jthak...@conversantmedia.com>>
Cc: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Why are automatic anti-entropy repairs required when hinted 
hand-off is enabled?

Thanks Jayesh,

Watched all of those.

Still not sure I fully get the theory behind it

Aside from the 2 failure  cases I mentioned earlier, the only other way data 
can become inconsistent  is error when replicating the data in the background. 
Does Cassandra have a retry policy for internal replication? Is there a setting 
to change it?





On Thu, Apr 6, 2017 at 10:54 PM, Thakrar, Jayesh 
<jthak...@conversantmedia.com<mailto:jthak...@conversantmedia.com>> wrote:
I had asked a similar/related question - on how to carry out repair, etc and 
got some useful pointers.
I would highly recommend the youtube video or the slideshare link below (both 
are for the same presentation).

https://www.youtube.com/watch?v=1Sz_K8UID6E

http://www.slideshare.net/DataStax/real-world-repairs-vinay-chella-netflix-cassandra-summit-2016

https://www.pythian.com/blog/effective-anti-entropy-repair-cassandra/

https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsRepair.html

https://www.datastax.com/dev/blog/repair-in-cassandra




From: eugene miretsky 
<eugene.miret...@gmail.com<mailto:eugene.miret...@gmail.com>>
Date: Thursday, April 6, 2017 at 3:35 PM
To: <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Why are automatic anti-entropy repairs required when hinted hand-off 
is enabled?

Hi,

As I see it, if hinted handoff is enabled, the only time data can be 
inconsistent is when:

  1.  A node is down for longer than the max_hint_window
  2.  The coordinator node crushes before all the hints have been replayed
Why is it still recommended to perform frequent automatic repairs, as well as 
enable read repair? Can't I just run a repair after one of the nodes is down? 
The only problem I see with this approach is a long repair job (instead of 
small incremental repairs). But other than that, are there any other 
issues/corner-cases?

Cheers,
Eugene



Reply via email to