On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond...@gmail.com> wrote: > > The patches have been rebased on the latest pgHead following the merge > of the conflict detection patch [1].
Thanks for working on patches. Summarizing the issues which need some suggestions/thoughts. 1) For subscription based resolvers, currently the syntax implemented is: 1a) CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> CONFLICT RESOLVER (conflict_type1 = resolver1, conflict_type2 = resolver2, conflict_type3 = resolver3,...); 1b) ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER (conflict_type1 = resolver1, conflict_type2 = resolver2, conflict_type3 = resolver3,...); Earlier the syntax suggested in [1] was: CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1', CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2'; I think the currently implemented syntax is good as it has less repetition, unless others think otherwise. ~~ 2) For subscription based resolvers, do we need a RESET command to reset resolvers to default? Any one of below or both? 2a) reset all at once: ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS 2b) reset one at a time: ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type'; The issue I see here is, to implement 1a and 1b, we have introduced the 'RESOLVER' keyword. If we want to implement 2a, we will have to introduce the 'RESOLVERS' keyword as well. But we can come up with some alternative syntax if we plan to implement these. Thoughts? ~~ 3) Regarding update_exists: 3a) Currently update_exists resolver patch is kept separate. The reason being, it performs resolution which will need deletion of multiple rows. It will be good to discuss if we want to target this in the first draft. Please see the example: create table tab (a int primary key, b int unique, c int unique); Pub: insert into tab values (1,1,1); Sub: insert into tab values (2,20,30); insert into tab values (3,40,50); insert into tab values (4,60,70); Pub: update tab set a=2,b=40,c=70 where a=1; The above 'update' on pub will result in 'update_exists' on sub and if resolution is in favour of 'apply', then it will conflict with all the three local rows of subscriber due to unique constraint present on all three columns. Thus in order to resolve the conflict, it will have to delete these 3 rows on sub: 2,20,30 3,40,50 4,60,70 and then update 1,1,1 to 2,40,70. Just need opinion on if we shall target this in the initial draft. 3b) If we plan to implement this, we need to work on optimal design where we can find all the conflicting rows at once and delete those. Currently the implementation has been done using recursion i.e. find one conflicting row, then delete it and then next and so on i.e. we call apply_handle_update_internal() recursively. On initial code review, I feel it is doable to scan all indexes at once and get conflicting-tuple-ids in one go and get rid of recursion. It can be attempted once we decide on 3a. ~~ 4) Now for insert_exists and update_exists, we are doing a pre-scan of all unique indexes to find conflict. Also there is post-scan to figure out if the conflicting row is inserted meanwhile. This needs to be reviewed for optimization. We need to avoid pre-scan wherever possible. I think the only case for which it can be avoided is 'ERROR'. For the cases where resolver is in favor of remote-apply, we need to check conflict beforehand to avoid rollback of already inserted data. And for the case where resolver is in favor of skipping the change, then too we should know beforehand about the conflict to avoid heap-insertion and rollback. Thoughts? ~~ 5) Currently we only capture update_missing conflict i.e. we are not distinguishing between the missing row and the deleted row. We had discussed this in the past a couple of times. If we plan to target it in draft 1, I can dig up all old emails and resume discussion on this. ~~ 6) Table-level resolves. There was a suggestion earlier to implement table-level resolvers. The patch has been implemented to some extent, it can be completed and posted when we are done reviewing subscription level resolvers. ~~ [1]: https://www.postgresql.org/message-id/CAA4eK1LhD%3DC5UwDeKxC_5jK4_ADtM7g%2BMoFW9qhziSxHbVVfeQ%40mail.gmail.com For clock-skew and timestamp based resolution, if needed, I will post another email for the design items where suggestions are needed. thanks Shveta