David Alves created KUDU-2195:
---------------------------------
Summary: Enforce durability happened before relationships on
multiple disks
Key: KUDU-2195
URL: https://issues.apache.org/jira/browse/KUDU-2195
Project: Kudu
Issue Type: Bug
Components: consensus, tablet
Reporter: David Alves
When using weaker durability semantics (e.g. when log_force_fsync is off) we
should still enforce certain happened before relationships which are not
currently being enforced when using different disks for the wal and data.
The two cases that come to mind where this is relevant are:
1) cmeta (c) -> wal (w) : We flush cmeta before flushing the wal (for instance
on term change) with the intention that either {}, {c} or {c, w} were made
durable.
2) wal (w) -> tablet meta (t): We flush the wal before tablet metadata to make
sure that that all commit messages that refer to on disk row sets (and deltas)
are on disk before the row sets they point to, i.e. with the intention that
either {}, {w} or {w, t} were made durable.
With strong durability semantics these are always made durable in the right
order. With weaker semantics that is not the case though. If using the same
disk for both the wal and data then the invariants are still preserved, as
buffers will be flushed in the right order but if using different disks for the
wal and data (and because cmeta is stored with the date).
1) in ext4 is actually safe, because we perform an fsync (indirect, rename()
implies fsync in ext4) when flushing cmeta. But it is not for xfs.
2) Is not safe in either filesystem.
--- Possible solutions --
For 1): Store cmeta with the wal; actually always fsync cmeta.
For 2): Store tablet meta with the wal; always fsync the wal before flushing
tablet meta.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)