Hi, We see a lot of cases out there when a node was down for longer than the GC period and once that node is up there are a lot of zombie data issues ... you know the story.
We would like to implement some kind of a check which would detect this so that node would not start in the first place so no issues would be there at all and it would be up to operators to figure out first what to do with it. There are a couple of ideas we were exploring with various pros and cons and I would like to know what you think about them. 1) Register a shutdown hook on "drain". This is already there (1). "drain" method is doing quite a lot of stuff and this is called on shutdown so our idea is to write a timestamp to system.local into a new column like "lastly_drained" or something like that and it would be read on startup. The disadvantage of this approach, or all approaches via shutdown hooks, is that it will only react only on SIGTERM and SIGINT. If that node is killed via SIGKILL, JVM just stops and there is basically nothing we have any guarantee of that would leave some traces behind. If it is killed and that value is not overwritten, on the next startup it might happen that it would be older than 10 days so it will falsely evaluate it should not be started. 2) Doing this on startup, you would check how old all your sstables and commit logs are, if no file was modified less than 10 days ago you would abort start, there is pretty big chance that your node did at least something in 10 days, there does not need to be anything added to system tables or similar and it would be just another StartupCheck. The disadvantage of this is that some dev clusters, for example, may run more than 10 days and they are just sitting there doing absolutely nothing at all, nobody interacts with them, nobody is repairing them, they are just sitting there. So when nobody talks to these nodes, no files are modified, right? It seems like there is not a silver bullet here, what is your opinion on this? Regards (1) https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L786-L799 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org