On Wed, Dec 14, 2022 at 3:48 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Dec 14, 2022 at 9:50 AM houzj.f...@fujitsu.com > <houzj.f...@fujitsu.com> wrote: > > > > On Tuesday, December 13, 2022 11:25 PM Masahiko Sawada > > <sawada.m...@gmail.com> wrote: > > > > > > Here are comments on v59 0001, 0002 patches: > > > > Thanks for the comments! > > > > > +void > > > +pa_increment_stream_block(ParallelApplyWorkerShared *wshared) { > > > + while (1) > > > + { > > > + SpinLockAcquire(&wshared->mutex); > > > + > > > + /* > > > + * Don't try to increment the count if the parallel > > > apply worker is > > > + * taking the stream lock. Otherwise, there would be > > > a race condition > > > + * that the parallel apply worker checks there is no > > > pending streaming > > > + * block and before it actually starts waiting on a > > > lock, the leader > > > + * sends another streaming block and take the stream > > > lock again. In > > > + * this case, the parallel apply worker will start > > > waiting for the next > > > + * streaming block whereas there is actually a > > > pending streaming block > > > + * available. > > > + */ > > > + if (!wshared->pa_wait_for_stream) > > > + { > > > + wshared->pending_stream_count++; > > > + SpinLockRelease(&wshared->mutex); > > > + break; > > > + } > > > + > > > + SpinLockRelease(&wshared->mutex); > > > + } > > > +} > > > > > > I think we should add an assertion to check if we don't hold the stream > > > lock. > > > > > > I think that waiting for pa_wait_for_stream to be false in a busy loop is > > > not a > > > good idea. It's not interruptible and there is not guarantee that we can > > > break > > > from this loop in a short time. For instance, if PA executes > > > pa_decr_and_wait_stream_block() a bit earlier than LA executes > > > pa_increment_stream_block(), LA has to wait for PA to acquire and release > > > the > > > stream lock in a busy loop. It should not be long in normal cases but the > > > duration LA needs to wait for PA depends on PA, which could be long. Also > > > what if PA raises an error in > > > pa_lock_stream() due to some reasons? I think LA won't be able to detect > > > the > > > failure. > > > > > > I think we should at least make it interruptible and maybe need to add > > > some > > > sleep. Or perhaps we can use the condition variable for this case. > > > > Or we can leave this while (true) logic altogether for the first > version and have a comment to explain this race. Anyway, after > restarting, it will probably be solved. We can always change this part > of the code later if this really turns out to be problematic. >
+1. Thank you Hou-san for adding this comment in the latest version (v61) patch! Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com