On Mon, Aug 26, 2024 at 2:23 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Thu, Aug 22, 2024 at 3:45 PM shveta malik <shveta.ma...@gmail.com> wrote: > > > > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond...@gmail.com> > > wrote: > > > > > > The patches have been rebased on the latest pgHead following the merge > > > of the conflict detection patch [1]. > > > > Thanks for working on patches. > > > > Summarizing the issues which need some suggestions/thoughts. > > > > 1) > > For subscription based resolvers, currently the syntax implemented is: > > > > 1a) > > CREATE SUBSCRIPTION <subname> > > CONNECTION <conninfo> PUBLICATION <pubname> > > CONFLICT RESOLVER > > (conflict_type1 = resolver1, conflict_type2 = resolver2, > > conflict_type3 = resolver3,...); > > > > 1b) > > ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER > > (conflict_type1 = resolver1, conflict_type2 = resolver2, > > conflict_type3 = resolver3,...); > > > > Earlier the syntax suggested in [1] was: > > CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> > > CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1', > > CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2'; > > > > I think the currently implemented syntax is good as it has less > > repetition, unless others think otherwise. > > > > ~~ > > > > 2) > > For subscription based resolvers, do we need a RESET command to reset > > resolvers to default? Any one of below or both? > > > > 2a) reset all at once: > > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS > > > > 2b) reset one at a time: > > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type'; > > > > The issue I see here is, to implement 1a and 1b, we have introduced > > the 'RESOLVER' keyword. If we want to implement 2a, we will have to > > introduce the 'RESOLVERS' keyword as well. But we can come up with > > some alternative syntax if we plan to implement these. Thoughts? > > > > It makes sense to have a RESET on the lines of (a) and (b). At this > stage, we should do minimal in extending the syntax. How about RESET > CONFLICT RESOLVER ALL for (a)?
Yes, the syntax looks good. > > ~~ > > > > 3) Regarding update_exists: > > > > 3a) > > Currently update_exists resolver patch is kept separate. The reason > > being, it performs resolution which will need deletion of multiple > > rows. It will be good to discuss if we want to target this in the > > first draft. Please see the example: > > > > create table tab (a int primary key, b int unique, c int unique); > > > > Pub: insert into tab values (1,1,1); > > Sub: > > insert into tab values (2,20,30); > > insert into tab values (3,40,50); > > insert into tab values (4,60,70); > > > > Pub: update tab set a=2,b=40,c=70 where a=1; > > > > The above 'update' on pub will result in 'update_exists' on sub and if > > resolution is in favour of 'apply', then it will conflict with all the > > three local rows of subscriber due to unique constraint present on all > > three columns. Thus in order to resolve the conflict, it will have to > > delete these 3 rows on sub: > > > > 2,20,30 > > 3,40,50 > > 4,60,70 > > and then update 1,1,1 to 2,40,70. > > > > Just need opinion on if we shall target this in the initial draft. > > > > This case looks a bit complicated. It seems there is no other > alternative than to delete the multiple rows. It is better to create a > separate top-up patch for this and we can discuss in detail about this > once the basic patch is in better shape. Agreed. > > > 3b) > > If we plan to implement this, we need to work on optimal design where > > we can find all the conflicting rows at once and delete those. > > Currently the implementation has been done using recursion i.e. find > > one conflicting row, then delete it and then next and so on i.e. we > > call apply_handle_update_internal() recursively. On initial code > > review, I feel it is doable to scan all indexes at once and get > > conflicting-tuple-ids in one go and get rid of recursion. It can be > > attempted once we decide on 3a. > > > > I suggest following the simplest strategy (even if that means calling > the update function recursively) by adding comments on the optimal > strategy. We can optimize it later as well. Sure. > > > ~~ > > > > 4) > > Now for insert_exists and update_exists, we are doing a pre-scan of > > all unique indexes to find conflict. Also there is post-scan to figure > > out if the conflicting row is inserted meanwhile. This needs to be > > reviewed for optimization. We need to avoid pre-scan wherever > > possible. I think the only case for which it can be avoided is > > 'ERROR'. For the cases where resolver is in favor of remote-apply, we > > need to check conflict beforehand to avoid rollback of already > > inserted data. And for the case where resolver is in favor of skipping > > the change, then too we should know beforehand about the conflict to > > avoid heap-insertion and rollback. Thoughts? > > > > It makes sense to skip the pre-scan wherever possible. Your analysis > sounds reasonable to me. > > > ~~ > > > > 5) > > Currently we only capture update_missing conflict i.e. we are not > > distinguishing between the missing row and the deleted row. We had > > discussed this in the past a couple of times. If we plan to target it > > in draft 1, I can dig up all old emails and resume discussion on this. > > > > This is a separate conflict detection project in itself. I am thinking > about the solution to this problem. We will talk about this in a > separate thread. > > > ~~ > > > > 6) > > Table-level resolves. There was a suggestion earlier to implement > > table-level resolvers. The patch has been implemented to some extent, > > it can be completed and posted when we are done reviewing subscription > > level resolvers. > > > > Yeah, it makes sense to do it after the subscription-level resolution > patch is ready. > > -- > With Regards, > Amit Kapila.