On Wed, Aug 7, 2024 at 6:39 PM Yifan Cai
<yc25c...@gmail.com> wrote:
With WatcherService, when events are missed (which is to
be expected), you will still need to list the files. It
seems to me that WatcherService doesn't offer
significant benefits in this case.
Yeah I think we leave it out eventually.
Regarding listing directory with a refresh flag, my
concern is the potential for abuse. End-users
might/could always refresh before listing, which could
undermine the purpose of caching. Perhaps Jeremiah can
provide more insight on this.
Well, by default, it would not be refreshed every single
time. You would need to opt-in into that. If there is a shop
which has users with a direct access to the disk of
Cassandra nodes and they are removing data manually, I do
not know what to say, what is nodetool clearsnapshot and jmx
methods good for then? I do not think we can prevent people
from shooting into their feet if they are absolutely willing
to do that.
If they want to refresh that every time, that would be equal
to the current behavior. It would be at most as "bad" as it
is now.
IMO, caching is best handled internally. I have a few
UX-related questions:
- Is it valid or acceptable to return stale data? If so,
end-users have to do some form of validation before
consuming each snapshot to account for potential deletions.
answer below
- Even if listsnapshot returns the most recent data, is
it possible that some of the directories get deleted
when end-users are accessing them? I think it is true.
It, then, enforces end-users to do some validation
first, similar to handling stale data.
I think that what you were trying to say is that when at
time T0 somebody lists snapshots and at T1 somebody removes
a snapshot manually then the list of snapshots is not actual
anymore? Sure. That is a thing. This is how it currently works.
Now, we want to cache them, so if you clear a snapshot which
is not physically there because somebody removed it
manually, that should be a no-op, it will be just removed
from the internal tracker. So, if it is at disk and in cache
and you clear it, then all is fine. It is fine too if it is
not on disk anymore and you clear it, then it is just
removed internally. It would fail only in case you want to
remove a snapshot which is not cached, regardless whether it
is on disk or not. The deletion of non-existing snapshot
ends up with a failure, nothing should be changed in that
regard, this is the current behavior too.
I want to say that I did not write it completely correctly
at the very beginning of this thread. Currently, we are
caching only _expiring_ snapshots, because we need to know
what is their time of removal so we act on it later.
_normal_ snapshots are _not_ cached _yet_. I spent so much
time with 18111 that I live in a reality where it is already
in, I forgot this is not actually in place yet, we are very
close to that.
OK thank you all for your insights, I will NOT use inotify.
Just my 2 cents.
- Yifan
On Wed, Aug 7, 2024 at 6:03 AM Štefan Miklošovič
<smikloso...@apache.org> wrote:
Yes, for example as reported here
https://issues.apache.org/jira/browse/CASSANDRA-13338
People who are charting this in monitoring
dashboards might also hit this.
On Wed, Aug 7, 2024 at 2:59 PM J. D. Jordan
<jeremiah.jor...@gmail.com> wrote:
If you have a lot of snapshots and have for
example a metric monitoring them and their
sizes, if you don’t cache it, creating the
metric can cause performance degradation. We
added the cache because we saw this happen to
databases more than once.
> On Aug 7, 2024, at 7:54 AM, Josh McKenzie
<jmcken...@apache.org> wrote:
>
>
>>
>> Snapshot metadata are currently stored in
memory / they are cached so we do not need to go
to disk every single time we want to list them,
the more snapshots we have, the worse it is.
> Are we enumerating our snapshots somewhere on
the hot path, or is this performance concern
misplaced?
>
>> On Wed, Aug 7, 2024, at 7:44 AM, Štefan
Miklošovič wrote:
>> Snapshot metadata are currently stored in
memory / they are cached so we do not need to go
to disk every single time we want to list them,
the more snapshots we have, the worse it is.
>>
>> When a snapshot is _manually_ removed from
disk, not from nodetool clearsnapshot, just by
rm -rf on a respective snapshot directory, then
such snapshot will be still visible in nodetool
listsnapshots. Manual removal of a snapshot
might be done e.g. by accident or by some
"impatient" operator who just goes to disk and
removes it there instead of using nodetool or
respective JMX method.
>>
>> To improve UX here, what I came up with is
that we might use Java's WatchService where each
snapshot dir would be registered. WatchService
is part of Java, it uses inotify subsystem which
is what Linux kernel offers. The result of doing
it is that once a snapshot dir is registered to
be watched and when it is removed then we are
notified about that via inotify / WatchService
so we can react on it and remove the in-memory
representation of that so it will not be visible
in the output anymore.
>>
>> While this works, there are some questions /
concerns
>>
>> 1) What do people think about inotify in
general? I tested this on 10k snapshots and it
seems to work just fine, nevertheless there is
in general no strong guarantee that every single
event will come through, there is also a family
of kernel parameters around this where more
tuning can be done etc. It is also questionable
how this will behave on other systems from Linux
(Mac etc). While JRE running on different
platforms also implements this, I am not
completely sure these implementations are
quality-wise the same as for Linux etc. There is
a history of not-so-quality implementations for
other systems (events not coming through on Macs
etc) and while I think we are safe on Linux, I
am not sure we want to go with this elsewhere.
>>
>> 2) inotify brings more entropy into the
codebase, it is another thing we need to take
care of etc (however, it is all concentrated in
one class and pretty much "isolated" from
everything else)
>>
>> I made this feature optional and it is turned
off by default so people need to explicitly
opt-in into this so we are not forcing it on
anybody.
>>
>> If we do not want to go with inotify, another
option would be to have a background thread
which would periodically check if a manifest
exists on a disk, if it does not, then a
snapshot does not either. While this works, what
I do not like about this is that the primary
reason we moved it to memory was to bypass IO as
much as possible yet here we would introduce
another check which would go to disk, and this
would be done periodically, which beats the
whole purpose. If an operator lists snapshots
once a week and there is a background check
running every 10 minutes (for example), then the
cummulative number of IO operations migth be
bigger than us just doing nothing at all. For
this reason, if we do not want to go with
inotify, I would also not implement any
automatic background check. Instead of that,
there would be SnapshotManagerMbean#refresh()
method which would just forcibly reload all
snapshots from scratch. I think that manual
deletion of snapshots is not something a user
would do regularly, snapshots are meant to be
removed via nodetool or JMX. If manual removal
ever happens then in order to make it
synchronized again, the refreshing of these
snapshots would be required. There might be an
additional flag in nodetool listsnapshots, once
specified, refreshing would be done, otherwise not.
>>
>> How does this all sound to people?
>>
>> Regards
>