>>>This sounds strange and you are not the first one that is asking this question If the order is changed, writing to journal ahead of writing to ledger, will it make any difference?
>>>The acknowledgement is sent to the client only after a successful fdatasync >>on the journal (if you do not ask for DEFERRED_SYNC or disable fsyncs >>>explicitly) Ah, I missed the callback passed in the QueueEntry. The flush implementation though, seems to be writing to file (BufferedChannel.flush), doesnt seem to be doing actual fileChannel.force? >>it is super fast and it >>guarantees the data have been persisted durable. Just curious, if there are any throughput/lagency tests to look at? Thanks, Unmesh On Tue, Jun 2, 2020 at 7:23 PM Enrico Olivelli <eolive...@gmail.com> wrote: > Il Mar 2 Giu 2020, 15:20 Unmesh Joshi <unmeshjo...@gmail.com> ha scritto: > > > Hi, > > > > I was going through bookkeeper code, particularly to see when and how > > transaction logs are written and flushed to disk. > > Just curious to understand, why in, Bookie.addEntryInternal method, > writes > > to journal happen after the writes to ledger. ( > > > > > https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java > > ) > > Also, journal writes are not flushed to disk synchronously, as they > happen > > in their own dedicated thread (and can also be done in batches). > > So I had two questions. > > 1. Why journal writes are not done before the writes to ledgers > > > > This sounds strange and you are not the first one that is asking this > question. > Basically entries in BK are immutabile and when the bookie restarts it > replays the journal. > The LAC protocol shields reader clients from reading entries that have not > been acknowledged to the writer. > > > 2. Why not to wait till journal writes are successful (even if not synced > > to disk may be) before returning the response. > > > > The acknowledgement is sent to the client only after a successful fdatasync > on the journal (if you do not ask for DEFERRED_SYNC or disable fsyncs > explicitly) > This is basically one of the core features of BK: it is super fast and it > guarantees the data have been persisted durable. > > Enrico > > > If these things are not done, there is always a risk of losing data in case > > of server or disk crash? > > > > Thanks, > > Unmesh > > >