> Yes but that doesn't really provide the monitoring that will really be
> helpful. If I don't realize it until 2 days then we potentially could be
> returning inconsistent results or not have data sync for 2 days until repair
> is run. It will be best to be able to monitor these things so that it can be
> run as soon as it is required (eg node down). Have such monitoring will be
> helpful for operations team to monitor also who may not know all internals
> of cassandra.

For the purpose of this discussion, nodes are always down in any
non-trivial time window. You may have flapping in the ring, individual
requests may time out, etc. Do not assume repair is not required just
because you have not had some kind of major outtage where a human
became consciously aware that a node was officially "down".

Unless you really know what you're doing, the thing to monitor is the
completion of repairs at sufficient frequency. In the event that
repair *doesn't* run, there needs to be enough time left until
tombstone expiry for someone to take some kind of action (whether that
is re-running repair again or re-configuring gcgraceseconds
temporarily is another matter).

Repair is not something that you only run in the event of some major
issue; repair is a regularly scheduled operation for your typical
cluster.

The invariant required by Cassandra is that repairs complete prior to
tombstones expiring (see URL in previous e-mail). Some applications,
given some combination of consistency levels, use-case and
requirements, may benefit from more frequent repair. But the important
part, is the minimum repair frequency mandated by Cassandra - and
determined by GCGraceSeconds.

-- 
/ Peter Schuller

Reply via email to