> On Dec 11, 2023, at 11:27 AM, Raymond Huffman <raymondmhuff...@gmail.com> > wrote: > > On our fork of Cassandra, we've implemented some custom behavior for handling > CommitLog and SSTable Corruption errors. Specifically, if a node detects one > of those errors, we want the node to stop itself, and if the node is > restarted, we want initialization to fail. This is
This is the correct behavior if you can reliably detect disk / memory failure which is usually the cause of corruption. > FSErrorHandler, and the error handler that's currently implemented at > org.apache.cassandra.db.commitlog.CommitLog#handleCommitError via config in > the same way one can provide custom Partitioners and > Authenticators/Authorizers. How would you implement this custom FSErrorHandler? Would it significantly vary between operators of Cassandra? If improperly implemented it may lead to serious outages. > Would you take as a contribution one of the following? > 1. user provided implementations of FSErrorHandler and CommitLogErrorHandler, > set via config; and/or > 2. new commit failure and disk failure policies that write a poison pill file > to disk and fail on startup if that file exists Maybe this can be added as feature to Cassandra without a need to customize / making it pluggable. It appears to be useful as described. If you have a branch with the proposed behavior it might make it easier to clarify any questions. Dinesh