And for completeness, a sample stack trace:

ERROR [2021-07-21T02:11:01.994Z]
org.apache.cassandra.db.commitlog.CommitLog: Failed commit log replay.
Commit disk failure policy is stop_on_startup; terminating thread
(throwable0_message: Mutation checksum failure at 15167277 in
CommitLog-5-1626828286977.log)
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
Mutation checksum failure at 15167277 in CommitLog-5-1626828286977.log
        at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:647)
        at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:519)
        at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:401)
        at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:143)
        at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:175)
        at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:155)
        at 
org.apache.cassandra.service.CassandraDaemon.recoverCommitlogAndCompleteSetup(CassandraDaemon.java:296)
        at 
org.apache.cassandra.service.CassandraDaemon.completeSetupMayThrowSstableException(CassandraDaemon.java:289)
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:222)
        at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630)
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:741)


On Mon, Jul 26, 2021 at 6:08 PM Leon Zaruvinsky <leonzaruvin...@gmail.com>
wrote:

> Currently we're using commitlog_batch:
>
>     commitlog_sync: batch
>     commitlog_sync_batch_window_in_ms: 2
>     commitlog_segment_size_in_mb: 32
>
> durable_writes is also true.
>
> Unfortunately we are still using Cassandra 2.2.x :( Though I'd be curious
> if much in this space has changed since then (I've looked through the
> changelogs and nothing stood out).
>
> On Mon, Jul 26, 2021 at 5:20 PM Jeff Jirsa <jji...@gmail.com> wrote:
>
>> What commitlog settings are you using?
>>
>> Default is periodic with 10s sync. That leaves you a 10s window on hard
>> poweroff/crash.
>>
>> I would also expect cassandra to cleanup and start cleanly, which version
>> are you running?
>>
>>
>>
>> On Mon, Jul 26, 2021 at 1:00 PM Leon Zaruvinsky <leonzaruvin...@gmail.com>
>> wrote:
>>
>>> Hi Cassandra community,
>>>
>>> We (and others) regularly run into commit log corruptions that are
>>> caused by Cassandra, or the underlying infrastructure, being hard
>>> restarted.  I suspect that this is because it happens in the middle of a
>>> commitlog file write to disk.
>>>
>>> Could anyone point me at resources / code to understand why this is
>>> happening?  Shouldn't Cassandra not be acking writes until the commitlog is
>>> safely written to disk?  I would expect that on startup, Cassandra should
>>> be able to clean up bad commitlog files and recover gracefully.
>>>
>>> I've seen various references online to this issue as something that will
>>> be fixed in the future - so I'm curious if there is any movement or
>>> thoughts there.
>>>
>>> Thanks a bunch,
>>> Leon
>>>
>>

Reply via email to