Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Ivan Rakov Thu, 15 Mar 2018 16:24:06 -0700

Igniters and especially Native Persistence experts,

We decided to change default WAL mode from DEFAULT(FSYNC) to LOG_ONLY in2.4 release. That was difficult decision: we sacrificed power loss / OScrash tolerance, but gained significant performance boost. From myperspective, LOG_ONLY is right choice, but it still misses some criticalfeatures that default mode should have.

Let's focus on exact guarantees each mode provides. Documentationexplains it in pretty simple manner: LOG_ONLY - writes survive processcrash, FSYNC - writes survive power loss scenarios. I have to noticethat documentation doesn't describe what exactly can happen to node inLOG_ONLY mode in case of power loss / OS crash scenario. Basically,there are two possible negative outcomes: loss of several last updates(it's exactly what can happen in BACKGROUND mode in case of processcrash) and total storage corruption (not only last updates, but all datawill be lost). I've made a quick research on this and came intoconclusion that power loss in LOG_ONLY can lead to storage corruption.There are several explanations for this:1) IgniteWriteAheadLogManager#fsync is kind of broken - it doesn'tperform actual fsync unless current WAL mode is FSYNC. We call thismethod when we write checkpoint marker to WAL. As long as part of WALbefore checkpoint marker can be not synced, "physical" records that arenecessary for crash recovery in "Node stopped in the middle ofcheckpoint" scenario may be corrupted after power loss. If that happens,we won't be able to recover internal data structures, which means lossof all data.2) We don't fsync WAL archive files unless current WAL mode is FSYNC.WAL archive can contain necessary "physical" records as well, whichleads us to the case described above.3) We do perform fsync on rollover (switch of current WAL segment) inall modes, but only when there's enough space to write switch segmentrecord - see FileWriteHandle#close. So there's a little chance thatwe'll skip fsync and bump into the same case.

Enforcing fsync on that three situations will give us a guarantee thatLOG_ONLY will survive power loss scenarios with possibility of losingseveral last updates. There still can be a total binary mess in the lastpart of WAL, but as long as we perform CRC check during WAL replay,we'll detect start of that mess. Extra fsyncs may cause slightperformance degradation - all writes will have to await for one fsync onevery rollover and checkpoint. It's still much faster than fsync onevery write in WAL - I expect a few percent (0-5%) drop comparing tocurrent LOG_ONLY. But degradation is degradation, and LOG_ONLY modewithout extra fsyncs makes sense as well - that's why we need tointroduce "LOG_ONLY + extra fsyncs" as separate WAL mode. I think, weshould make it default - it provides significant durability bonus forthe cost of one extra fsync for each WAL segment written.


To sum it up, I propose a new set of possible WAL modes:
NONE - both process crash and power loss can lead to corruption

BACKGROUND - process crash can lead to last updates loss, power loss canlead to corruption

LOG_ONLY - writes survive process crash, power loss can lead to corruption

LOG_ONLY_SAFE (default) - writes survive process crash, power loss canlead to last updates loss

FSYNC - writes survive both process crash and power loss

Thoughts?


Best Regards,
Ivan Rakov

Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Reply via email to