On Thu, Aug 22, 2024 at 3:45 PM shveta malik <shveta.ma...@gmail.com> wrote: > > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond...@gmail.com> wrote: > > > > The patches have been rebased on the latest pgHead following the merge > > of the conflict detection patch [1]. > > Thanks for working on patches. > > Summarizing the issues which need some suggestions/thoughts. > > 1) > For subscription based resolvers, currently the syntax implemented is: > > 1a) > CREATE SUBSCRIPTION <subname> > CONNECTION <conninfo> PUBLICATION <pubname> > CONFLICT RESOLVER > (conflict_type1 = resolver1, conflict_type2 = resolver2, > conflict_type3 = resolver3,...); > > 1b) > ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER > (conflict_type1 = resolver1, conflict_type2 = resolver2, > conflict_type3 = resolver3,...); > > Earlier the syntax suggested in [1] was: > CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> > CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1', > CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2'; > > I think the currently implemented syntax is good as it has less > repetition, unless others think otherwise. > > ~~ > > 2) > For subscription based resolvers, do we need a RESET command to reset > resolvers to default? Any one of below or both? > > 2a) reset all at once: > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS > > 2b) reset one at a time: > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type'; > > The issue I see here is, to implement 1a and 1b, we have introduced > the 'RESOLVER' keyword. If we want to implement 2a, we will have to > introduce the 'RESOLVERS' keyword as well. But we can come up with > some alternative syntax if we plan to implement these. Thoughts? >
It makes sense to have a RESET on the lines of (a) and (b). At this stage, we should do minimal in extending the syntax. How about RESET CONFLICT RESOLVER ALL for (a)? > ~~ > > 3) Regarding update_exists: > > 3a) > Currently update_exists resolver patch is kept separate. The reason > being, it performs resolution which will need deletion of multiple > rows. It will be good to discuss if we want to target this in the > first draft. Please see the example: > > create table tab (a int primary key, b int unique, c int unique); > > Pub: insert into tab values (1,1,1); > Sub: > insert into tab values (2,20,30); > insert into tab values (3,40,50); > insert into tab values (4,60,70); > > Pub: update tab set a=2,b=40,c=70 where a=1; > > The above 'update' on pub will result in 'update_exists' on sub and if > resolution is in favour of 'apply', then it will conflict with all the > three local rows of subscriber due to unique constraint present on all > three columns. Thus in order to resolve the conflict, it will have to > delete these 3 rows on sub: > > 2,20,30 > 3,40,50 > 4,60,70 > and then update 1,1,1 to 2,40,70. > > Just need opinion on if we shall target this in the initial draft. > This case looks a bit complicated. It seems there is no other alternative than to delete the multiple rows. It is better to create a separate top-up patch for this and we can discuss in detail about this once the basic patch is in better shape. > 3b) > If we plan to implement this, we need to work on optimal design where > we can find all the conflicting rows at once and delete those. > Currently the implementation has been done using recursion i.e. find > one conflicting row, then delete it and then next and so on i.e. we > call apply_handle_update_internal() recursively. On initial code > review, I feel it is doable to scan all indexes at once and get > conflicting-tuple-ids in one go and get rid of recursion. It can be > attempted once we decide on 3a. > I suggest following the simplest strategy (even if that means calling the update function recursively) by adding comments on the optimal strategy. We can optimize it later as well. > ~~ > > 4) > Now for insert_exists and update_exists, we are doing a pre-scan of > all unique indexes to find conflict. Also there is post-scan to figure > out if the conflicting row is inserted meanwhile. This needs to be > reviewed for optimization. We need to avoid pre-scan wherever > possible. I think the only case for which it can be avoided is > 'ERROR'. For the cases where resolver is in favor of remote-apply, we > need to check conflict beforehand to avoid rollback of already > inserted data. And for the case where resolver is in favor of skipping > the change, then too we should know beforehand about the conflict to > avoid heap-insertion and rollback. Thoughts? > It makes sense to skip the pre-scan wherever possible. Your analysis sounds reasonable to me. > ~~ > > 5) > Currently we only capture update_missing conflict i.e. we are not > distinguishing between the missing row and the deleted row. We had > discussed this in the past a couple of times. If we plan to target it > in draft 1, I can dig up all old emails and resume discussion on this. > This is a separate conflict detection project in itself. I am thinking about the solution to this problem. We will talk about this in a separate thread. > ~~ > > 6) > Table-level resolves. There was a suggestion earlier to implement > table-level resolvers. The patch has been implemented to some extent, > it can be completed and posted when we are done reviewing subscription > level resolvers. > Yeah, it makes sense to do it after the subscription-level resolution patch is ready. -- With Regards, Amit Kapila.