Re: The most reliable way to determine the last time node was up

Paulo Motta Wed, 03 Nov 2021 15:45:33 -0700

> I would expect that if nobody talks to a node and no operation is
running, it does not produce any "side effects".


In order to track the last checkpoint timestamp you need to persist it
periodically to prevent against losing state during an ungraceful shutdown
(ie. kill -9).

However you're right this may generate tons of sstables if we're persisting
it periodically to a system table, even if we skip the commit log. We could
tune system.local compaction to use LCS but it would still generate
periodic compaction activity.  In this case an external marker file sounds
much simpler and cleaner.

The downsides I see to the marker file approach are:
a) External clients cannot query last checkpoint time easily
b) The state is lost if the marker file is removed.

However we could solve these issues with:
a) exposing the info via a system table
b) fallback to min(last commitlog/sstable timestamp)

I prefer an explicit mechanism to track last checkpoint (ie. marker file)
vs implicit min(last commitlog/sstable timestamp) so we don't create
unnecessary coupling between different subsystems.

Cheers,

Paulo

Em qua., 3 de nov. de 2021 às 19:29, Stefan Miklosovic <
[email protected]> escreveu:

> Yes this is the combination of system.local and "marker file"
> approach, basically updating that field periodically.
>
> However, when there is a mutation done against the system table (in
> this example), it goes to a commit log and then it will be propagated
> to sstable on disk, no? So in our hypothetical scenario, if a node is
> not touched by anybody, it would still behave like it _does_
> something. I would expect that if nobody talks to a node and no
> operation is running, it does not produce any "side effects".
>
> I just do not want to generate any unnecessary noise. A node which
> does not do anything should not change its data. I am not sure if it
> is like that already or if an inactive node still does writes new
> sstables after some time, I doubt that.
>
> On Wed, 3 Nov 2021 at 22:58, Paulo Motta <[email protected]> wrote:
> >
> > How about a last_checkpoint (or better name) system.local column that is
> > updated periodically (ie. every minute) + on drain? This would give a
> lower
> > time bound on when the node was last live without requiring an external
> > marker file.
> >
> > On Wed, 3 Nov 2021 at 18:03 Stefan Miklosovic <
> > [email protected]> wrote:
> >
> > > The third option would be to have some thread running in the
> > > background "touching" some (empty) marker file, it is the most simple
> > > solution but I do not like the idea of this marker file, it feels
> > > dirty, but hey, while it would be opt-in feature for people knowing
> > > what they want, why not right ...
> > >
> > > On Wed, 3 Nov 2021 at 21:53, Stefan Miklosovic
> > > <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > We see a lot of cases out there when a node was down for longer than
> > > > the GC period and once that node is up there are a lot of zombie data
> > > > issues ... you know the story.
> > > >
> > > > We would like to implement some kind of a check which would detect
> > > > this so that node would not start in the first place so no issues
> > > > would be there at all and it would be up to operators to figure out
> > > > first what to do with it.
> > > >
> > > > There are a couple of ideas we were exploring with various pros and
> > > > cons and I would like to know what you think about them.
> > > >
> > > > 1) Register a shutdown hook on "drain". This is already there (1).
> > > > "drain" method is doing quite a lot of stuff and this is called on
> > > > shutdown so our idea is to write a timestamp to system.local into a
> > > > new column like "lastly_drained" or something like that and it would
> > > > be read on startup.
> > > >
> > > > The disadvantage of this approach, or all approaches via shutdown
> > > > hooks, is that it will only react only on SIGTERM and SIGINT. If that
> > > > node is killed via SIGKILL, JVM just stops and there is basically
> > > > nothing we have any guarantee of that would leave some traces behind.
> > > >
> > > > If it is killed and that value is not overwritten, on the next
> startup
> > > > it might happen that it would be older than 10 days so it will
> falsely
> > > > evaluate it should not be started.
> > > >
> > > > 2) Doing this on startup, you would check how old all your sstables
> > > > and commit logs are, if no file was modified less than 10 days ago
> you
> > > > would abort start, there is pretty big chance that your node did at
> > > > least something in 10 days, there does not need to be anything added
> > > > to system tables or similar and it would be just another
> StartupCheck.
> > > >
> > > > The disadvantage of this is that some dev clusters, for example, may
> > > > run more than 10 days and they are just sitting there doing
> absolutely
> > > > nothing at all, nobody interacts with them, nobody is repairing them,
> > > > they are just sitting there. So when nobody talks to these nodes, no
> > > > files are modified, right?
> > > >
> > > > It seems like there is not a silver bullet here, what is your
> opinion on
> > > this?
> > > >
> > > > Regards
> > > >
> > > > (1)
> > >
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L786-L799
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: The most reliable way to determine the last time node was up

Reply via email to