On Mon, May 1, 2017 at 5:56 PM, Sijie Guo <guosi...@gmail.com> wrote:
> I don't think this is an inconsistent issue. The in memory update is > updating lac not current entry. Even the entry is added into memory but > this entry will not be readable after lac is advanced, lac is advanced only > after the next entry is added which happened after current entry is acked. > That is not true. You are talking about piggy-backed LAC only. But with Explicit LAC you don't need next entry to move LAC on bookie. > So adding the entry to memory doesn't expose any consistency issue. > > On May 1, 2017 5:44 PM, "Venkateswara Rao Jujjuri" <jujj...@gmail.com> > wrote: > > On Mon, May 1, 2017 at 2:31 PM, Yiming Zang <yz...@twitter.com.invalid> > wrote: > > > Hi Andrey, > > > > That's a good point, and you're actually correct that if write to > memTable > > got throttled somehow, the addEntry request latency will be affected a > lot. > > This actually happens a few times in production cluster. Normally, the > idea > > of using Journal is to write data to the write-ahead log and then persist > > the actual data to disks or add to memTable. However, my understanding of > > why we choose to write entry to ledgerStorage first is to improve the > > tailing-read performance. > > > > In SortedLedgerStorage.java, we first add entry to memTable and then we > > update lastAddConfirmed, which means if there's a long poll read request > or > > readLastAddConfirmed request, it will immediately get satisfied for the > > latest entry before we actually log the entry into Journal. So > tailing-read > > doesn't actually need to wait for any disk operation in Bookkeeper > > including Journal operation. > > > > public long addEntry(ByteBuffer entry) throws IOException { > > long ledgerId = entry.getLong(); > > long entryId = entry.getLong(); > > long lac = entry.getLong(); > > entry.rewind(); > > memTable.addEntry(ledgerId, entryId, entry, this); > > ledgerCache.updateLastAddConfirmed(ledgerId, lac); > > return entryId; > > } > > > > But thinking about here, I'm wondering if it's actually safe to update > the > > LAC before we write the entry to Journal. What if we tell the client the > > LAC has been updated but we actually failed to write the entry to Journal > > and Bookie crashed at that time? Would this bring any inconsistency > issue? > > > > Good point. This is indeed an inconsistency issue. BK guarantees "if you > read once you can read it all the time". > If it is really done for LAC it is not really good idea. Unless I am > missing something, this must be changed ASAP. > > Thanks, > JV > > > > > > On Mon, May 1, 2017 at 2:13 PM, Andrey Yegorov <andrey.yego...@gmail.com > > > > wrote: > > > > > Hi, > > > > > > Looking at the code in Bookie.java I noticed that write to journal > (which > > > is supposed to be a write-ahead log as I understand) happened after > write > > > to ledger storage. > > > This looks counter-intuitive, can someone explain why is it done in > this > > > order? > > > > > > My primary concern is that ledger storage write can be delayed (i.e. > > > EntryMemTable's addEntry can do throttleWriters() in some cases) thus > > > dragging overall client's view of add latency up even though it is > > possible > > > that journal's write (i.e. in case of dedicated journal disk) will > > complete > > > faster. > > > > > > private void addEntryInternal(LedgerDescriptor handle, ByteBuffer > > > entry, WriteCallback cb, Object ctx) > > > > > > throws IOException, BookieException { > > > > > > long ledgerId = handle.getLedgerId(); > > > > > > entry.rewind(); > > > > > > *// ledgerStorage.addEntry() is happening here* > > > > > > long entryId = handle.addEntry(entry); > > > > > > > > > entry.rewind(); > > > > > > writeBytes.add(entry.remaining()); > > > > > > > > > LOG.trace("Adding {}@{}", entryId, ledgerId); > > > > > > *// journal add entry is happening here* > > > > > > *// callback/response to client is sent after journal add is done.* > > > > > > journal.logAddEntry(entry, cb, ctx); > > > > > > } > > > > > > > > > > > > ---------- > > > Andrey Yegorov > > > > > > > > > -- > Jvrao > --- > First they ignore you, then they laugh at you, then they fight you, then > you win. - Mahatma Gandhi > -- Jvrao --- First they ignore you, then they laugh at you, then they fight you, then you win. - Mahatma Gandhi