Hi, I am working with my colleagues at an implementation to relax the constraint that every acknowledged entry must have been successfully written and fsynced to disk at journal level.
The idea is to have a flag in addEntry to ask for acknowledge not after the fsync in journal but only when data has been successfully written and flushed to the SO. I have the requirement that if an entry requires synch all the entries successfully sent 'before' that entry (causality) are synched too, even if they have been added with the new relaxed durability flag. Imagine a database transaction log, during a transaction I will write every change to data to the WAL with the new flag, and only the commit transaction command will be added with synch requirement. The idea is that all the changes inside the scope of the transaction have a meaning only if the transaction is committed, so it is important that the commit entry won't be lost and if that entry isn't lost all of the other entries of the same transaction aren't lost too. I have another use case. In another project I am storing binary objects into BK and I have to obtain great performance even on single disk bookie layouts (journal + data + index on the same partition). In this project it is acceptable to compensate the risk of not doing fsynch if requesting enough replication. IMHO it will be somehow like the Kakfa idea of durability, as far as I know Kafka by default does not impose fsynch but it leaves all to the SO and to the fact that there is a minimal configurable number of replicas which are in-synch. There are many open points, already suggested by Matteo, JV and Sijie: - LAC protocol? - replication in case of lost entries? - under production load mixing non synched entries with synched entries will not give much benefits For the LAC protocol I think that there is no impact, the point is that the LastAddConfirmed is the max entryid which is known to have been acknowledged to the writer, so durability is not a concern. You can loose entries even with fsynch, just by loosing all the disks which contains the data. Without fsynch it is just more probable. Replication: maybe we should write in the ledger metadata that the ledger allows this feature and deal with it. But I am not sure, I have to understand better how LaderHandleAdv deals with sparse entryids inside the re-replication process Mixed workload: honestly I would like to add this feature to limit the number of fsynch, and I expect to have lots of bursts of unsynched entries to be interleaved with a few synched entries. I know that this feature is not to be encouraged in general but only for specific cases, like the stories of LedgerHandleAdv or readUnconfirmedEntries If this makes sense to you I will create a BP and attach a first patch Enrico -- -- Enrico Olivelli