Hi Jayesh,

Your statements are mostly right, except:
Yes, compactions do purge tombstones but that *does not avoid resurrection*.
A resurrection takes place in this situation:

Node A:
Key A is written
Key A is deleted

Node B:
Key A is written
- Deletion never happens for example because of a dropped mutation-

Then after gc_grace_seconds:
Node A:
Compaction removes both write and tombstone, so data is completely gone

Node B:
Still contains Key A

Then you do a repair
Node A:
Receives Key A from Node B

Got it?

But I was thinking a bit about your situation. If you NEVER do deletes and
have ONLY TTLs, this could change the game. Difference? If you have only
TTLs, the delete information and the write information resides always on
the same node and never exists alone, so the write-delete pair should
always be consistent. As far as i can see there will no be ressurections
then.
BUT: Please don't nail me down on it. *I have neither tested it nor read
the source code to prove it in theory.*

Maybe some other guys have some more thoughts or information on this.

By the way:
CS itself is not fragile. Distributed systems are. It's like the old
saying: Things that can go wrong will go wrong. Network fails, hardware
fails, software fails. You can have timeouts, dropped messages (timeouts
help a cluster/node to survive high pressure situations), a crashed daemon.
Yes things go wrong. All the time. Even on a 1 node system (like MySQL)
ensuring absolute consistency is not so easy and requires many safety nets
like unbuffered IO and battery backed HD controllers which can harm
performance a lot.

You could also create a perfectly consistent distributed system like CS but
it would be slow and not partition tolerant or not highly available.

2017-02-28 16:06 GMT+01:00 Thakrar, Jayesh <jthak...@conversantmedia.com>:

> Thanks - getting a better picture of things.
>
>
>
> So "entropy" is tendency of a C* datastore to be inconsistent due to
> writes/updates not taking place across ALL nodes that carry replica of a
> row (can happen if nodes are down for maintenance)
>
> It can also happen due to node crashes/restarts that can result in loss of
> uncommitted data.
>
> This can result in either stale data or ghost data (column/row
> re-appearing after a delete).
>
> So there are the "anti-entropy" processes in place to help with this
>
> - hinted handoff
>
> - read repair (can happen while performing a consistent read OR also async
> as driven/configured by *_read_repair_chance AFTER consistent read)
>
> - commit logs
>
> - explicit/manual repair via command
>
> - compaction (compaction is indirect mechanism to purge tombstone, thereby
> ensuring that stale data will NOT resurrect)
>
>
>
> So for an application where you have only timeseries data or where data is
> always inserted, I would like to know the need for manual repair?
>
>
>
> I see/hear advice that there should always be a periodic (mostly weekly)
> manual/explicit repair in a C* system - and that's what I am trying to
> understand.
>
> Repair is a real expensive process and would like to justify the need to
> expend resources (when and how much) for it.
>
>
>
> Among other things, this advice also gives an impression to people not
> familiar with C* (e.g. me) that it is too fragile and needs substantial
> manual intervention.
>
>
>
> Appreciate all the feedback and details that you have been sharing.
>
>
>
> *From: *Edward Capriolo <edlinuxg...@gmail.com>
> *Date: *Monday, February 27, 2017 at 8:00 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Cc: *Benjamin Roth <benjamin.r...@jaumo.com>
> *Subject: *Re: Is periodic manual repair necessary?
>
>
>
> There are 4 anti entropy systems in cassandra.
>
>
>
> Hinted handoff
>
> Read repair
>
> Commit logs
>
> Repair commamd
>
>
>
> All are basically best effort.
>
>
>
> Commit logs get corrupt and only flush periodically.
>
>
>
> Bits rot on disk and while crossing networks network
>
>
>
> Read repair is async and only happens randomly
>
>
>
> Hinted handoff stops after some time and is not guarenteed.
> On Monday, February 27, 2017, Thakrar, Jayesh <
> jthak...@conversantmedia.com> wrote:
>
> Thanks Roth and Oskar for your quick responses.
>
>
>
> This is a single datacenter, multi-rack setup.
>
>
>
> > A TTL is technically similar to a delete - in the end both create
> tombstones.
>
> >If you want to eliminate the possibility of resurrected deleted data, you
> should run repairs.
>
> So why do I need to worry about data resurrection?
>
> Because, the TTL for the data is specified at the row level (atleast in
> this case) i.e. across ALL columns across ALL replicas.
>
> So they all will have the same data or wont have the data at all (i.e. it
> would have been tombstoned).
>
>
>
>
>
> > If you can guarantuee a 100% that data is read-repaired before
> gc_grace_seconds after the data has been TTL'ed, you won't need an extra
> repair.
>
> Why read-repaired before "gc_grace_period"?
>
> Isn't gc_grace_period the grace period for compaction to occur?
>
> So if the data was not consistent and read-repair happens before that,
> then well and good.
>
> Does read-repair not happen after gc/compaction?
>
> If this table has data being constantly/periodically inserted, then
> compaction will also happen accordingly, right?
>
>
>
> Thanks,
>
> Jayesh
>
>
>
>
>
> *From: *Benjamin Roth <benjamin.r...@jaumo.com>
> *Date: *Monday, February 27, 2017 at 11:53 AM
> *To: *<user@cassandra.apache.org>
> *Subject: *Re: Is periodic manual repair necessary?
>
>
>
> A TTL is technically similar to a delete - in the end both create
> tombstones.
>
> If you want to eliminate the possibility of resurrected deleted data, you
> should run repairs.
>
>
>
> If you can guarantuee a 100% that data is read-repaired before
> gc_grace_seconds after the data has been TTL'ed, you won't need an extra
> repair.
>
>
>
> 2017-02-27 18:29 GMT+01:00 Oskar Kjellin <oskar.kjel...@gmail.com>:
>
> Are you running multi dc?
>
> Skickat från min iPad
>
>
> 27 feb. 2017 kl. 16:08 skrev Thakrar, Jayesh <jthak...@conversantmedia.com
> >:
>
> Suppose I have an application, where there are no deletes, only 5-10% of
> rows being occasionally updated (and that too only once) and a lot of reads.
>
>
>
> Furthermore, I have replication = 3 and both read and write are configured
> for local_quorum.
>
>
>
> Occasionally, servers do go into maintenance.
>
>
>
> I understand when the maintenance is longer than the period for
> hinted_handoffs to be preserved, they are lost and servers may have stale
> data.
>
> But I do expect it to be rectified on reads. If the stale data is not read
> again, I don’t care for it to be corrected as then the data will be
> automatically purged because of TTL.
>
>
>
> In such a situation, do I need to have a periodic (weekly?) manual/batch
> read_repair process?
>
>
>
> Thanks,
>
> Jayesh Thakrar
>
>
>
>
>
> --
>
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>
>

Reply via email to