Relax durability

Enrico Olivelli Thu, 17 Aug 2017 04:43:19 -0700

Hi,
I am working with my colleagues at an implementation to relax the
constraint that every acknowledged entry must have been successfully
written and fsynced to disk at journal level.


The idea is to have a flag in addEntry to ask for acknowledge not after the
fsync in journal but only when data has been successfully written and
flushed to the SO.

I have the requirement that if an entry requires synch all the entries
successfully sent 'before' that entry (causality) are synched too, even if
they have been added with the new relaxed durability flag.

Imagine a database transaction log, during a transaction I will write every
change to data to the WAL with the new flag, and only the commit
transaction command will be added with synch requirement. The idea is that
all the changes inside the scope of the transaction have a meaning only if
the transaction is committed, so it is important that the commit entry
won't be lost and if that entry isn't lost all of the other entries of the
same transaction aren't lost too.

I have another use case. In another project I am storing binary objects
into BK and I have to obtain great performance even on single disk bookie
layouts (journal + data + index on the same partition). In this project it
is acceptable to compensate the risk of not doing fsynch if requesting
enough replication.
IMHO it will be somehow like the Kakfa idea of durability, as far as I know
Kafka by default does not impose fsynch but it leaves all to the SO and to
the fact that there is a minimal configurable number of replicas which are
in-synch.

There are many open points, already suggested by Matteo, JV and Sijie:
- LAC protocol?
- replication in case of lost entries?
- under production load mixing non synched entries with synched entries
will not give much benefits


For the LAC protocol I think that there is no impact, the point is that the
LastAddConfirmed is the max entryid which is known to have been
acknowledged to the writer, so durability is not a concern. You can loose
entries even with fsynch, just by loosing all the disks which contains the
data. Without fsynch it is just more probable.

Replication: maybe we should write in the ledger metadata that the ledger
allows this feature and deal with it. But I am not sure, I have to
understand better how LaderHandleAdv deals with sparse entryids inside the
re-replication process

Mixed workload: honestly I would like to add this feature to limit the
number of fsynch, and I expect to have lots of bursts of unsynched entries
to be interleaved with a few synched entries. I know that this feature is
not to be encouraged in general but only for specific cases, like the
stories of LedgerHandleAdv or readUnconfirmedEntries

If this makes sense to you I will create a BP and attach a first patch

Enrico





-- 


-- Enrico Olivelli

Relax durability

Reply via email to