Re: Questions related to the data in SSTable files

Robert Coli Wed, 23 Oct 2013 08:40:14 -0700

On Wed, Oct 23, 2013 at 5:23 AM, java8964 java8964 <java8...@hotmail.com>wrote:


> We enabled the major repair on every node every 7 days.
>

This is almost certainly the cause of your many duplicates.

If you don't DELETE heavily, consider changing gc_grace_seconds to 34 days
and then doing a repair on the first of the month.


> If one node persistent a write, plus a "hint" of failed replication write,
> this write will still store as one write in its SSTable files, right? Why
> need to store 2 copies as duplication in SSTable files?
>

Write destined for replica nodes A B C.

Write comes into A.

Write "fails" but actually succeeds in replicating to B. A writes it as a
hint.

B flushes its memtable.

A then delivers hint to B, creating another copy of the identical write in
a memtable.

B then flushes this new memtable.

There are now two copies of the same write on disk.


> Here is the duplication count happened in our SSTable files. You can see a
> lot of data duplicate 2 times, but also some with even higher number. But
> max duplication count is 27, can one client retry 27 times?
>

This many duplicates are almost certainly a result of repair
over-repairing. Re-read this chunk from my previous mail :


> Repair has a fixed granularity, so the larger the size of your dataset the
> more "over-repair" any given "repair" will cause.
>
> Duplicates occur as a natural consequences of this, if you have 1 row
> which differs in the merkle tree chunk and the merkle tree chunk is, for
> example, 1000 rows.. you will "repair" one row and "duplicate" the other
> 999.
>

Question #2 from your original mail is also almost certainly a result of
"over-repair." The "duplicate" chunks can be from any time.

=Rob
PS - What cassandra version?

Re: Questions related to the data in SSTable files

Reply via email to