[ 
https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854876#comment-17854876
 ] 

Stefan Miklosovic edited comment on CASSANDRA-18111 at 6/13/24 11:25 PM:
-------------------------------------------------------------------------

https://github.com/apache/cassandra/pull/3374

I went with WatchService approach. How it works is that when somebody deletes 
snapshots from disk by hand, this will be detected and it will be removed from 
SnapshotManager hence it will not be visible among nodetool listsnapshots nor 
in system_views.snapshots. 

At the same time, we are not going to the disk when we list snapshots (nodetool 
/ jmx / cql). 

We will ever go to disk only when we 1) start a node and load it all 2) remove 
a snapshot. Listing is io-free. 

cc [~rustyrazorblade] [~jjirsa] to let you know when it comes to IO as we were 
recently dealing with hints if you remember. I am trying to not go to the disk 
while listing either.


was (Author: smiklosovic):
https://github.com/apache/cassandra/pull/3374

I went with WatchService approach. How it works is that when somebody delete 
snapshots from disk by hand, this will be detected and it will be removed from 
SnapshotManager hence it will not be visible among nodetool listsnapshots nor 
in system_views.snapshots. 

At the same time, we are not going to the disk when we list snapshots (nodetool 
/ jmx / cql). 

We will ever go to disk only when we 1) start node and load it all 2) remove a 
snapshot. Listing is io-free. 

cc [~rustyrazorblade] [~jjirsa] to let you know when it comes to IO as we were 
recently dealing with hints if you remember. I am trying to not to go to disk 
with snapshot at all either.

> Cache snapshots in memory
> -------------------------
>
>                 Key: CASSANDRA-18111
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18111
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Snapshots
>            Reporter: Paulo Motta
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Everytime {{nodetool listsnapshots}} is called, all data directories are 
> scanned to find snapshots, what is inefficient.
> For example, fetching the 
> {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric 
> can take half a second (CASSANDRA-13338).
> This improvement will also allow snapshots to be efficiently queried via 
> virtual tables (CASSANDRA-18102).
> In order to do this, we should:
> a) load all snapshots from disk during initialization
> b) keep a collection of snapshots on {{SnapshotManager}}
> c) update the snapshots collection anytime a new snapshot is taken or cleared
> d) detect when a snapshot is manually removed from disk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to