On Mon, May 1, 2017 at 6:14 PM, Venkateswara Rao Jujjuri <jujj...@gmail.com>
wrote:

> On Mon, May 1, 2017 at 6:03 PM, Venkateswara Rao Jujjuri <
> jujj...@gmail.com>
> wrote:
>
> >
> >
> > On Mon, May 1, 2017 at 5:56 PM, Sijie Guo <guosi...@gmail.com> wrote:
> >
> >> I don't think this is an inconsistent issue. The in memory update is
> >> updating lac not current entry. Even the entry is added into memory but
> >> this entry will not be readable after lac is advanced, lac is advanced
> >> only
> >> after the next entry is added which happened after current entry is
> acked.
> >>
> >
> > That is not true. You are talking about piggy-backed LAC only. But with
> > Explicit LAC
> > you don't need next entry to move LAC on bookie.
> >
>
> Sorry, I pushed send before finishing. :)
>
> So you don't need next entry to move LAC forward, but its client job to
> move LAC forward.
> Hence client need to send explicit LAC to update LAC after it hear back
> from AckQuorum.
> Hence Sijie is right on this part, it is not a consistency issue. :)
>
>
> But never the less, I believe we need to change the order as it is not
> completely shielding
> writes from other activity. @Sijie do you see any issue if we write to
> journal, ack to client
> and the write to ledger ?
>

Based on my understanding about this email thread, the concern comes from
the latency on write. However, it doesn't change any latency behavior if
you add to journal first and add to memtable later. 'Throttling' will still
happen when you add entry to memtable.

So the question would be "can we write to journal and back back immediate
after written to journal, and add the entry to memtable in background"?

The answer would be "no". Because this would volatile the correctness. It
might end up a case - the lac is already advanced but the entry is not
found - it can happen in following sequence.

- Client issue write entry N (lac = N-1)
- Bookie write the entry to the journal and acknowledge. Entry N is in the
journal but haven't been added to the memtable.
- Client received the acknowledge and advanced LAC from N-1 to N.
- Client write another entry N+1 (lac = N) to advance LAC.
- Another client (reader) detects LAC is advanced from N-1 to N. it
attempts to read entry N but N isn't added to ledger storage. (*The
correctness is volatiled here*)

So to summarize my thoughts on this:

- The acknowledge should happen after both writing the entry to journal and
write the entry to memtable.
- The order of writing the entry to journal and writing entry to memtable
doesn't matter here.
- Writing the entry to the memtable helps with tailing latency (because it
will advance LAC first).

- Sijie


>
> JV
>
>
> >
> >
> >> So adding the entry to memory doesn't expose any consistency issue.
> >>
> >> On May 1, 2017 5:44 PM, "Venkateswara Rao Jujjuri" <jujj...@gmail.com>
> >> wrote:
> >>
> >> On Mon, May 1, 2017 at 2:31 PM, Yiming Zang <yz...@twitter.com.invalid>
> >> wrote:
> >>
> >> > Hi Andrey,
> >> >
> >> > That's a good point, and you're actually correct that if write to
> >> memTable
> >> > got throttled somehow, the addEntry request latency will be affected a
> >> lot.
> >> > This actually happens a few times in production cluster. Normally, the
> >> idea
> >> > of using Journal is to write data to the write-ahead log and then
> >> persist
> >> > the actual data to disks or add to memTable. However, my understanding
> >> of
> >> > why we choose to write entry to ledgerStorage first is to improve the
> >> > tailing-read performance.
> >> >
> >> > In SortedLedgerStorage.java, we first add entry to memTable and then
> we
> >> > update lastAddConfirmed, which means if there's a long poll read
> request
> >> or
> >> > readLastAddConfirmed request, it will immediately get satisfied for
> the
> >> > latest entry before we actually log the entry into Journal. So
> >> tailing-read
> >> > doesn't actually need to wait for any disk operation in Bookkeeper
> >> > including Journal operation.
> >> >
> >> > public long addEntry(ByteBuffer entry) throws IOException {
> >> > long ledgerId = entry.getLong();
> >> > long entryId = entry.getLong();
> >> > long lac = entry.getLong();
> >> > entry.rewind();
> >> > memTable.addEntry(ledgerId, entryId, entry, this);
> >> > ledgerCache.updateLastAddConfirmed(ledgerId, lac);
> >> > return entryId;
> >> > }
> >> >
> >> > But thinking about here, I'm wondering if it's actually safe to update
> >> the
> >> > LAC before we write the entry to Journal. What if we tell the client
> the
> >> > LAC has been updated but we actually failed to write the entry to
> >> Journal
> >> > and Bookie crashed at that time? Would this bring any inconsistency
> >> issue?
> >> >
> >>
> >> Good point. This is indeed an inconsistency issue. BK guarantees "if you
> >> read once you can read it all the time".
> >> If it is really done for LAC it is not really good idea. Unless I am
> >> missing something, this must be changed ASAP.
> >>
> >> Thanks,
> >> JV
> >>
> >>
> >> >
> >> > On Mon, May 1, 2017 at 2:13 PM, Andrey Yegorov <
> >> andrey.yego...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > Looking at the code in Bookie.java I noticed that write to journal
> >> (which
> >> > > is supposed to be a write-ahead log as I understand) happened after
> >> write
> >> > > to ledger storage.
> >> > > This looks counter-intuitive, can someone explain why is it done in
> >> this
> >> > > order?
> >> > >
> >> > > My primary concern is that ledger storage write can be delayed (i.e.
> >> > > EntryMemTable's addEntry can do throttleWriters() in some cases)
> thus
> >> > > dragging overall client's view of add latency up even though it is
> >> > possible
> >> > > that journal's write (i.e. in case of dedicated journal disk) will
> >> > complete
> >> > > faster.
> >> > >
> >> > >     private void addEntryInternal(LedgerDescriptor handle,
> ByteBuffer
> >> > > entry, WriteCallback cb, Object ctx)
> >> > >
> >> > >             throws IOException, BookieException {
> >> > >
> >> > >         long ledgerId = handle.getLedgerId();
> >> > >
> >> > >         entry.rewind();
> >> > >
> >> > > *// ledgerStorage.addEntry() is happening here*
> >> > >
> >> > >         long entryId = handle.addEntry(entry);
> >> > >
> >> > >
> >> > >         entry.rewind();
> >> > >
> >> > >         writeBytes.add(entry.remaining());
> >> > >
> >> > >
> >> > >         LOG.trace("Adding {}@{}", entryId, ledgerId);
> >> > >
> >> > > *// journal add entry is happening here*
> >> > >
> >> > > *// callback/response to client is sent after journal add is done.*
> >> > >
> >> > >         journal.logAddEntry(entry, cb, ctx);
> >> > >
> >> > >     }
> >> > >
> >> > >
> >> > >
> >> > > ----------
> >> > > Andrey Yegorov
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Jvrao
> >> ---
> >> First they ignore you, then they laugh at you, then they fight you, then
> >> you win. - Mahatma Gandhi
> >>
> >
> >
> >
> > --
> > Jvrao
> > ---
> > First they ignore you, then they laugh at you, then they fight you, then
> > you win. - Mahatma Gandhi
> >
> >
> >
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>

Reply via email to