I found the reason of my server freeze:

COMMIT-LOG-WRITER thread is gone, dead, so the blocking queue in
PeriodicCommitLogExecutorService is full, then all mutationStage jobs
are stuck on the mutations flushing.

the COMMIT-LOG-WRITER thread died because at one time the disk was full,
I cleaned up the disk space (not deleting cassandra files, but other files),
but then since the thread is gone, system is still stuck.
so I had to restart the server.


is it better to let the WRITER thread handle file system exceptions or
let it die? granted letting disk go full is not
a good practice, but letting the system proceed after disk is freed
seems a more natural expectation.

Thanks
Yang

Reply via email to