Hi Andrey,

That's a good point, and you're actually correct that if write to memTable
got throttled somehow, the addEntry request latency will be affected a lot.
This actually happens a few times in production cluster. Normally, the idea
of using Journal is to write data to the write-ahead log and then persist
the actual data to disks or add to memTable. However, my understanding of
why we choose to write entry to ledgerStorage first is to improve the
tailing-read performance.

In SortedLedgerStorage.java, we first add entry to memTable and then we
update lastAddConfirmed, which means if there's a long poll read request or
readLastAddConfirmed request, it will immediately get satisfied for the
latest entry before we actually log the entry into Journal. So tailing-read
doesn't actually need to wait for any disk operation in Bookkeeper
including Journal operation.

public long addEntry(ByteBuffer entry) throws IOException {
long ledgerId = entry.getLong();
long entryId = entry.getLong();
long lac = entry.getLong();
entry.rewind();
memTable.addEntry(ledgerId, entryId, entry, this);
ledgerCache.updateLastAddConfirmed(ledgerId, lac);
return entryId;
}

But thinking about here, I'm wondering if it's actually safe to update the
LAC before we write the entry to Journal. What if we tell the client the
LAC has been updated but we actually failed to write the entry to Journal
and Bookie crashed at that time? Would this bring any inconsistency issue?

On Mon, May 1, 2017 at 2:13 PM, Andrey Yegorov <andrey.yego...@gmail.com>
wrote:

> Hi,
>
> Looking at the code in Bookie.java I noticed that write to journal (which
> is supposed to be a write-ahead log as I understand) happened after write
> to ledger storage.
> This looks counter-intuitive, can someone explain why is it done in this
> order?
>
> My primary concern is that ledger storage write can be delayed (i.e.
> EntryMemTable's addEntry can do throttleWriters() in some cases) thus
> dragging overall client's view of add latency up even though it is possible
> that journal's write (i.e. in case of dedicated journal disk) will complete
> faster.
>
>     private void addEntryInternal(LedgerDescriptor handle, ByteBuffer
> entry, WriteCallback cb, Object ctx)
>
>             throws IOException, BookieException {
>
>         long ledgerId = handle.getLedgerId();
>
>         entry.rewind();
>
> *// ledgerStorage.addEntry() is happening here*
>
>         long entryId = handle.addEntry(entry);
>
>
>         entry.rewind();
>
>         writeBytes.add(entry.remaining());
>
>
>         LOG.trace("Adding {}@{}", entryId, ledgerId);
>
> *// journal add entry is happening here*
>
> *// callback/response to client is sent after journal add is done.*
>
>         journal.logAddEntry(entry, cb, ctx);
>
>     }
>
>
>
> ----------
> Andrey Yegorov
>

Reply via email to