Hi Sylvain,
Thanks for explaination :-) However, in this case, I still do not get
why this (probably) gcable tombstone on 2:1 could cause this mess. As AE
ignores only the tombstone itself (which means that there are no data
for this key on 2:1 node from repair's point of view), it should result
in repairing this inconsistency by streaming missing data from 1:7 (thus
there'll be both: live data and gcable tombstone on 2:1 after the
repair), shouldn't it?
Yes, I'm considering running a major compaction (we store about 200KB of
data in this CF, so it's not a problem at all ;-) ), but before I do I
want to make sure I understand the problem, so as long as I can live
with QUORUM read / writes, I'll wait with compacting and play a bit with
this problem :-)
M.
W dniu 04.04.2013 12:28, Sylvain Lebresne pisze:
I'm considering a problem related to this issue:
https://issues.apache.org/**jira/browse/CASSANDRA-4905<https://issues.apache.org/jira/browse/CASSANDRA-4905>
Let's say the tombstone on one of the nodes (X) is gcable and was not
compacted (purged) so far. After it was created we re-created this row, but
due some problems it was written only to the second node (Y), so we have
"live" data on node Y which is newer than the gcable tombstone on replica
node X. Some time ago we did NOT repair our cluster for a while (well,
pretty long while), so it's possible that such situation happened.
That would my bet, yes.
My concern is: will AntiEntropy ignore this tombstone only, or basically
everything related to the row key that this tombstone was created for?
It will only ignore the tombstone itself.
In theory, that older than gcgrace tombstone should eventually be reclaimed
by compaction, though it's not guaranteed that it will be by the first
compaction including it (but if you use SizeTieredCompaction, a major
compaction would ensure that you get rid of it; that being said, I'm not
necessarily advising a major compaction, if you can afford to wait for
normal compaction to get rid of it, that's probably simpler).
--
Sylvain
If it's not the case, here are the answers you asked for :-)
What version are you on ?
1.2.1
(plus CASSANDRA-5298 & CASSANDRA-5299 patches to be exact ;-) )
Can you run a repair on the CF and check:
Does the repair detect differences in the CF and stream changes ?
After the streaming does it run a secondary index rebuild on the new
sstable ? (Should be in the logs)
I'm attaching a log file (cssa-repair.log).
Just to clarify: the key I use for tests belongs to *:1:7 node and *:2:1
is a replica for that node (checked with nodetool getendpoints). Yesterday
I was repairing this CF cluster-wide, but to (hopefully) make debugging
simplier, what I send you is related only to these two nodes.
So as I understand these logs: no changes have been detected and nothing
was streamed. Indexes have not been rebuilt, obviously.
However, on the other hand I'd expect to get "Nothing to repair for
keyspace production" in nodetool output in this case - am I wrong? I'm a
bit confused with the info I get here ;-)
Can you provide the full query trace ?
I'm attaching two files, as this stack trace is pretty long: no-index.log
(query by row key) and index.log (query by indexed column).
M.