Note that read repairs only occur for QUORUM/equivalent and higher, and
also with a 10% (default) chance on anything less than QUORUM
(ONE/LOCAL_ONE). This is configured at the table level through the
dclocal_read_repair_chance and read_repair_chance settings (which are going
away in 4.0). So if yo
Hi Ben,
That makes sense. I also read about "read repairs". So, once an
inconsistent record is read, cassandra synchronizes its replicas on other
nodes as well. I ran the same spark query again, this time with default
consistency level (LOCAL_ONE) and the result was correct.
Thanks again for the
Hi Faraz
Yes, it likely does mean there is inconsistency in the replicas. However,
you shouldn’t be too freaked out about it - Cassandra is design to allow
for this inconsistency to occur and the consistency levels allow you to
achieve consistent results despite replicas not being consistent. To k
Thanks a lot for the response.
Setting consistency to ALL/TWO started giving me consistent count results
on both cqlsh and spark. As expected, my query time has increased by 1.5x (
Before, it was taking ~1.6 hours but with consistency level ALL, same query
is taking ~2.4 hours to complete.)
Does
Both CQLSH and the Spark Cassandra query at consistent level ONE (LOCAL_ONE
for Spark connector) by default so if there is any inconsistency in your
replicas this can resulting in inconsistent query results.
See http://cassandra.apache.org/doc/latest/tools/cqlsh.html and
https://github.com/datasta
The fact that cqlsh itself gives different results tells me that this has
nothing to do with spark. Moreover, spark results are monotonically
increasing which seem to be more consistent than cqlsh. so I believe
spark can be taken out of the equation.
Now, while you are running these queries is th