Yes; That is exactly how my theory is and the fix is also the same. But
still looking into how we could get into it.

// If the object is recycled while this task is on the queue lh could get null.
// monitorPendingAddOps() can't schedule after recycling because
// time timed out pendingAddOp won't be available on the
// pendingAddOps concurrent queue after a successful recycle.
//if (completed) {
if (lh == null || completed) {



On Tue, Feb 16, 2021 at 2:30 PM Andrey Yegorov <andrey.yego...@gmail.com>
wrote:

> This could happen if PendingAddOp got recycled (and set lh to null) before
> the runnable submitted by timeoutQuorumWait had a chance to run.
> I don't know how it got into this state but if is easy to miss if recycled
> object gets reused quickly - timeout will fail wrong ledger.
>
> The right thing to do is to figure bout how recycling could happen before
> timeout runnable had a chance to run.
>
> As workaround you can simply cache ledger handle before creating runnable
> and use the cached value in the runnable.
> Hope this makes sense.
>
> On Tue, Feb 16, 2021 at 13:12 Enrico Olivelli <eolive...@gmail.com> wrote:
>
> > Sorry
> > I never seen that error
> >
> > Enrico
> >
> > Il giorno mar 16 feb 2021 alle ore 19:50 Venkateswara Rao Jujjuri <
> > jujj...@gmail.com> ha scritto:
> >
> > > Ignore my comment about different thread, I see that
> timeoutQuorumWait()
> > is
> > > run through OSE.
> > > But we did NPE in this line
> > > <
> > >
> >
> https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L173
> > > >.
> > > In this run, we have disabled ensemble changes.
> > >
> > > On Tue, Feb 16, 2021 at 10:28 AM Venkateswara Rao Jujjuri <
> > > jujj...@gmail.com>
> > > wrote:
> > >
> > > > We have observed NPE in pendingAddOp in this line
> > > > <
> > >
> >
> https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L173
> > > >.
> > > >
> > > > This looks like a race between execution of recyclePendAddOpObject()
> in
> > > > OSE context and maybeTimeout() in monitorPendingAddOps() executing in
> > > > bk.scheduler context.
> > > >
> > > > This looks like an age-old code and wondering why we haven't seen
> this
> > so
> > > > far.
> > > > Has Anyone in the community observed this?
> > > >
> > > >
> > > > --
> > > > Jvrao
> > > > ---
> > > > First they ignore you, then they laugh at you, then they fight you,
> > then
> > > > you win. - Mahatma Gandhi
> > > >
> > > >
> > > >
> > >
> > > --
> > > Jvrao
> > > ---
> > > First they ignore you, then they laugh at you, then they fight you,
> then
> > > you win. - Mahatma Gandhi
> > >
> >
> --
> ----------
> Andrey Yegorov
>


-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Reply via email to