On Thu, Aug 22, 2024 at 3:45 PM shveta malik <shveta.ma...@gmail.com> wrote: > > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond...@gmail.com> wrote: > > > > The patches have been rebased on the latest pgHead following the merge > > of the conflict detection patch [1]. > > Thanks for working on patches. > > Summarizing the issues which need some suggestions/thoughts. > > 1) > For subscription based resolvers, currently the syntax implemented is: > > 1a) > CREATE SUBSCRIPTION <subname> > CONNECTION <conninfo> PUBLICATION <pubname> > CONFLICT RESOLVER > (conflict_type1 = resolver1, conflict_type2 = resolver2, > conflict_type3 = resolver3,...); > > 1b) > ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER > (conflict_type1 = resolver1, conflict_type2 = resolver2, > conflict_type3 = resolver3,...); > > Earlier the syntax suggested in [1] was: > CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> > CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1', > CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2'; > > I think the currently implemented syntax is good as it has less > repetition, unless others think otherwise. > > ~~ > > 2) > For subscription based resolvers, do we need a RESET command to reset > resolvers to default? Any one of below or both? > > 2a) reset all at once: > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS > > 2b) reset one at a time: > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type'; > > The issue I see here is, to implement 1a and 1b, we have introduced > the 'RESOLVER' keyword. If we want to implement 2a, we will have to > introduce the 'RESOLVERS' keyword as well. But we can come up with > some alternative syntax if we plan to implement these. Thoughts? > > ~~ > > 3) Regarding update_exists: > > 3a) > Currently update_exists resolver patch is kept separate. The reason > being, it performs resolution which will need deletion of multiple > rows. It will be good to discuss if we want to target this in the > first draft. Please see the example: > > create table tab (a int primary key, b int unique, c int unique); > > Pub: insert into tab values (1,1,1); > Sub: > insert into tab values (2,20,30); > insert into tab values (3,40,50); > insert into tab values (4,60,70); > > Pub: update tab set a=2,b=40,c=70 where a=1; > > The above 'update' on pub will result in 'update_exists' on sub and if > resolution is in favour of 'apply', then it will conflict with all the > three local rows of subscriber due to unique constraint present on all > three columns. Thus in order to resolve the conflict, it will have to > delete these 3 rows on sub: > > 2,20,30 > 3,40,50 > 4,60,70 > and then update 1,1,1 to 2,40,70. > > Just need opinion on if we shall target this in the initial draft. > > 3b) > If we plan to implement this, we need to work on optimal design where > we can find all the conflicting rows at once and delete those. > Currently the implementation has been done using recursion i.e. find > one conflicting row, then delete it and then next and so on i.e. we > call apply_handle_update_internal() recursively. On initial code > review, I feel it is doable to scan all indexes at once and get > conflicting-tuple-ids in one go and get rid of recursion. It can be > attempted once we decide on 3a. > > ~~ > > 4) > Now for insert_exists and update_exists, we are doing a pre-scan of > all unique indexes to find conflict. Also there is post-scan to figure > out if the conflicting row is inserted meanwhile. This needs to be > reviewed for optimization. We need to avoid pre-scan wherever > possible. I think the only case for which it can be avoided is > 'ERROR'. For the cases where resolver is in favor of remote-apply, we > need to check conflict beforehand to avoid rollback of already > inserted data. And for the case where resolver is in favor of skipping > the change, then too we should know beforehand about the conflict to > avoid heap-insertion and rollback. Thoughts? > +1 to the idea of optimization, but it seems that when the resolver is set to ERROR, skipping the pre-scan only optimizes the case where no conflict exists. If a conflict is found, the apply-worker will error out during the pre-scan, and no post-scan occurs, so there's no opportunity for optimization. However, if no conflict is present, we currently do both pre-scan and post-scan. Skipping the pre-scan in this scenario could be a worthwhile optimization, even if it only benefits the no-conflict case.
-- Thanks, Nisha