The real problem/issue is - having extremely fast journal disk doesn't
really mask write latencies from a slower ledger disk.

To address this rate correctness issue, cant we read from journal if the
entryid >= LAC (as we cache now on bookie) and journal read fails?

On Mon, May 1, 2017 at 6:33 PM, Sijie Guo <[email protected]> wrote:

> In the other to think about this,
>
> when 'throttling' happens,  it typically means:
>
> - the bookie doesn't have enough bandwidth/capacity to keep up with the
> traffic.
> - the disks on the bookie might have problems (e.g. slow down or other
> hardware issues).
>
> Either case can happen. It might be worth to let the throttling kick in,
> rather than let journal disk accepting writes and putting ledger storage
> into worse state.
>
> - Sijie
>
> On Mon, May 1, 2017 at 6:23 PM, Sijie Guo <[email protected]> wrote:
>
> >
> >
> > On Mon, May 1, 2017 at 6:14 PM, Venkateswara Rao Jujjuri <
> > [email protected]> wrote:
> >
> >> On Mon, May 1, 2017 at 6:03 PM, Venkateswara Rao Jujjuri <
> >> [email protected]>
> >> wrote:
> >>
> >> >
> >> >
> >> > On Mon, May 1, 2017 at 5:56 PM, Sijie Guo <[email protected]> wrote:
> >> >
> >> >> I don't think this is an inconsistent issue. The in memory update is
> >> >> updating lac not current entry. Even the entry is added into memory
> but
> >> >> this entry will not be readable after lac is advanced, lac is
> advanced
> >> >> only
> >> >> after the next entry is added which happened after current entry is
> >> acked.
> >> >>
> >> >
> >> > That is not true. You are talking about piggy-backed LAC only. But
> with
> >> > Explicit LAC
> >> > you don't need next entry to move LAC on bookie.
> >> >
> >>
> >> Sorry, I pushed send before finishing. :)
> >>
> >> So you don't need next entry to move LAC forward, but its client job to
> >> move LAC forward.
> >> Hence client need to send explicit LAC to update LAC after it hear back
> >> from AckQuorum.
> >> Hence Sijie is right on this part, it is not a consistency issue. :)
> >>
> >>
> >> But never the less, I believe we need to change the order as it is not
> >> completely shielding
> >> writes from other activity. @Sijie do you see any issue if we write to
> >> journal, ack to client
> >> and the write to ledger ?
> >>
> >
> > Based on my understanding about this email thread, the concern comes from
> > the latency on write. However, it doesn't change any latency behavior if
> > you add to journal first and add to memtable later. 'Throttling' will
> still
> > happen when you add entry to memtable.
> >
> > So the question would be "can we write to journal and back back immediate
> > after written to journal, and add the entry to memtable in background"?
> >
> > The answer would be "no". Because this would volatile the correctness. It
> > might end up a case - the lac is already advanced but the entry is not
> > found - it can happen in following sequence.
> >
> > - Client issue write entry N (lac = N-1)
> > - Bookie write the entry to the journal and acknowledge. Entry N is in
> the
> > journal but haven't been added to the memtable.
> > - Client received the acknowledge and advanced LAC from N-1 to N.
> > - Client write another entry N+1 (lac = N) to advance LAC.
> > - Another client (reader) detects LAC is advanced from N-1 to N. it
> > attempts to read entry N but N isn't added to ledger storage. (*The
> > correctness is volatiled here*)
> >
> > So to summarize my thoughts on this:
> >
> > - The acknowledge should happen after both writing the entry to journal
> > and write the entry to memtable.
> > - The order of writing the entry to journal and writing entry to memtable
> > doesn't matter here.
> > - Writing the entry to the memtable helps with tailing latency (because
> it
> > will advance LAC first).
> >
> > - Sijie
> >
> >
> >>
> >> JV
> >>
> >>
> >> >
> >> >
> >> >> So adding the entry to memory doesn't expose any consistency issue.
> >> >>
> >> >> On May 1, 2017 5:44 PM, "Venkateswara Rao Jujjuri" <
> [email protected]>
> >> >> wrote:
> >> >>
> >> >> On Mon, May 1, 2017 at 2:31 PM, Yiming Zang
> <[email protected]
> >> >
> >> >> wrote:
> >> >>
> >> >> > Hi Andrey,
> >> >> >
> >> >> > That's a good point, and you're actually correct that if write to
> >> >> memTable
> >> >> > got throttled somehow, the addEntry request latency will be
> affected
> >> a
> >> >> lot.
> >> >> > This actually happens a few times in production cluster. Normally,
> >> the
> >> >> idea
> >> >> > of using Journal is to write data to the write-ahead log and then
> >> >> persist
> >> >> > the actual data to disks or add to memTable. However, my
> >> understanding
> >> >> of
> >> >> > why we choose to write entry to ledgerStorage first is to improve
> the
> >> >> > tailing-read performance.
> >> >> >
> >> >> > In SortedLedgerStorage.java, we first add entry to memTable and
> then
> >> we
> >> >> > update lastAddConfirmed, which means if there's a long poll read
> >> request
> >> >> or
> >> >> > readLastAddConfirmed request, it will immediately get satisfied for
> >> the
> >> >> > latest entry before we actually log the entry into Journal. So
> >> >> tailing-read
> >> >> > doesn't actually need to wait for any disk operation in Bookkeeper
> >> >> > including Journal operation.
> >> >> >
> >> >> > public long addEntry(ByteBuffer entry) throws IOException {
> >> >> > long ledgerId = entry.getLong();
> >> >> > long entryId = entry.getLong();
> >> >> > long lac = entry.getLong();
> >> >> > entry.rewind();
> >> >> > memTable.addEntry(ledgerId, entryId, entry, this);
> >> >> > ledgerCache.updateLastAddConfirmed(ledgerId, lac);
> >> >> > return entryId;
> >> >> > }
> >> >> >
> >> >> > But thinking about here, I'm wondering if it's actually safe to
> >> update
> >> >> the
> >> >> > LAC before we write the entry to Journal. What if we tell the
> client
> >> the
> >> >> > LAC has been updated but we actually failed to write the entry to
> >> >> Journal
> >> >> > and Bookie crashed at that time? Would this bring any inconsistency
> >> >> issue?
> >> >> >
> >> >>
> >> >> Good point. This is indeed an inconsistency issue. BK guarantees "if
> >> you
> >> >> read once you can read it all the time".
> >> >> If it is really done for LAC it is not really good idea. Unless I am
> >> >> missing something, this must be changed ASAP.
> >> >>
> >> >> Thanks,
> >> >> JV
> >> >>
> >> >>
> >> >> >
> >> >> > On Mon, May 1, 2017 at 2:13 PM, Andrey Yegorov <
> >> >> [email protected]>
> >> >> > wrote:
> >> >> >
> >> >> > > Hi,
> >> >> > >
> >> >> > > Looking at the code in Bookie.java I noticed that write to
> journal
> >> >> (which
> >> >> > > is supposed to be a write-ahead log as I understand) happened
> after
> >> >> write
> >> >> > > to ledger storage.
> >> >> > > This looks counter-intuitive, can someone explain why is it done
> in
> >> >> this
> >> >> > > order?
> >> >> > >
> >> >> > > My primary concern is that ledger storage write can be delayed
> >> (i.e.
> >> >> > > EntryMemTable's addEntry can do throttleWriters() in some cases)
> >> thus
> >> >> > > dragging overall client's view of add latency up even though it
> is
> >> >> > possible
> >> >> > > that journal's write (i.e. in case of dedicated journal disk)
> will
> >> >> > complete
> >> >> > > faster.
> >> >> > >
> >> >> > >     private void addEntryInternal(LedgerDescriptor handle,
> >> ByteBuffer
> >> >> > > entry, WriteCallback cb, Object ctx)
> >> >> > >
> >> >> > >             throws IOException, BookieException {
> >> >> > >
> >> >> > >         long ledgerId = handle.getLedgerId();
> >> >> > >
> >> >> > >         entry.rewind();
> >> >> > >
> >> >> > > *// ledgerStorage.addEntry() is happening here*
> >> >> > >
> >> >> > >         long entryId = handle.addEntry(entry);
> >> >> > >
> >> >> > >
> >> >> > >         entry.rewind();
> >> >> > >
> >> >> > >         writeBytes.add(entry.remaining());
> >> >> > >
> >> >> > >
> >> >> > >         LOG.trace("Adding {}@{}", entryId, ledgerId);
> >> >> > >
> >> >> > > *// journal add entry is happening here*
> >> >> > >
> >> >> > > *// callback/response to client is sent after journal add is
> done.*
> >> >> > >
> >> >> > >         journal.logAddEntry(entry, cb, ctx);
> >> >> > >
> >> >> > >     }
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > ----------
> >> >> > > Andrey Yegorov
> >> >> > >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jvrao
> >> >> ---
> >> >> First they ignore you, then they laugh at you, then they fight you,
> >> then
> >> >> you win. - Mahatma Gandhi
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Jvrao
> >> > ---
> >> > First they ignore you, then they laugh at you, then they fight you,
> then
> >> > you win. - Mahatma Gandhi
> >> >
> >> >
> >> >
> >>
> >>
> >> --
> >> Jvrao
> >> ---
> >> First they ignore you, then they laugh at you, then they fight you, then
> >> you win. - Mahatma Gandhi
> >>
> >
> >
>



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Reply via email to