On Fri, Feb 3, 2023 at 6:58 PM houzj.f...@fujitsu.com <houzj.f...@fujitsu.com> wrote: > > On Thursday, February 2, 2023 7:21 PM Amit Kapila <amit.kapil...@gmail.com> > wrote: > > > > On Thu, Feb 2, 2023 at 12:05 PM houzj.f...@fujitsu.com > > <houzj.f...@fujitsu.com> wrote: > > > > > > On Tuesday, January 31, 2023 1:07 AM vignesh C <vignes...@gmail.com> > > wrote: > > > > On Mon, 30 Jan 2023 at 17:30, vignesh C <vignes...@gmail.com> wrote: > > > > > > > > > > I also tried to test the time of "src/test/subscription/t/002_types.pl" > > > before and after the patch(change the lock level) and Tom's > > > patch(split > > > transaction) like what Vignesh has shared on -hackers. > > > > > > I run about 100 times for each case. Tom's and the lock level patch > > > behave similarly on my machines[1]. > > > > > > HEAD: 3426 ~ 6425 ms > > > HEAD + Tom: 3404 ~ 3462 ms > > > HEAD + Vignesh: 3419 ~ 3474 ms > > > HEAD + Tom + Vignesh: 3408 ~ 3454 ms > > > > > > Even apart from the testing time reduction, reducing the lock level > > > and lock the specific object can also help improve the lock contention > > > which user(that use the exposed function) , table sync worker and > > > apply worker can also benefit from it. So, I think pushing the patch to > > > change > > the lock level makes sense. > > > > > > And the patch looks good to me. > > > > > > > Thanks for the tests. I also see a reduction in test time variability with > > Vignesh's > > patch. I think we can release the locks in case the origin is concurrently > > dropped as in the attached patch. I am planning to commit this patch > > tomorrow unless there are more comments or objections. > > > > > While on it, after pushing the patch, I think there is another case > > > might also worth to be improved, that is the table sync and apply > > > worker try to drop the same origin which might cause some delay. This > > > is another case(different from the deadlock), so I feel we can try to > > > improve > > this in another patch. > > > > > > > Right, I think that case could be addressed by Tom's patch to some extent > > but > > I am thinking we should also try to analyze if we can completely avoid the > > need > > to remove origins from both processes. One idea could be to introduce > > another relstate something like PRE_SYNCDONE and set it in a separate > > transaction before we set the state as SYNCDONE and remove the slot and > > origin in tablesync worker. > > Now, if the tablesync worker errors out due to some reason during the second > > transaction, it can remove the slot and origin after restart by checking > > the state. > > However, it would add another relstate which may not be the best way to > > address this problem. Anyway, that can be accomplished as a separate patch. > > Here is an attempt to achieve the same. > Basically, the patch removes the code that drop the origin in apply worker. > And > add a new state PRE_SYNCDONE after synchronization finished in front of apply > (sublsn set), but before dropping the origin and other final cleanups. The > tablesync will restart and redo the cleanup if it failed after reaching the > new > state. Besides, since the changes can already be applied on the table in > PRE_SYNCDONE state, so I also modified the check in > should_apply_changes_for_rel(). And some other conditions for the origin drop > in subscription commands are were adjusted in this patch. >
BTW, the tablesync.c has a large file header comment which describes all about the relstates including some examples. So this patch needs to include modifications to that comment. ------ Kind Regards, Peter Smith. Fujitsu Australia.