Re: Conflict Detection and Resolution

Amit Kapila Mon, 26 Aug 2024 01:53:38 -0700

On Thu, Aug 22, 2024 at 3:45 PM shveta malik <[email protected]> wrote:
>
> On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <[email protected]> wrote:
> >
> > The patches have been rebased on the latest pgHead following the merge
> > of the conflict detection patch [1].
>
> Thanks for working on patches.
>
> Summarizing the issues which need some suggestions/thoughts.
>
> 1)
> For subscription based resolvers, currently the syntax implemented is:
>
> 1a)
> CREATE SUBSCRIPTION <subname>
> CONNECTION <conninfo> PUBLICATION <pubname>
> CONFLICT RESOLVER
>     (conflict_type1 = resolver1, conflict_type2 = resolver2,
> conflict_type3 = resolver3,...);
>
> 1b)
> ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER
>     (conflict_type1 = resolver1, conflict_type2 = resolver2,
> conflict_type3 = resolver3,...);
>
> Earlier the syntax suggested in [1] was:
> CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname>
> CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1',
> CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2';
>
> I think the currently implemented syntax  is good as it has less
> repetition, unless others think otherwise.
>
> ~~
>
> 2)
> For subscription based resolvers, do we need a RESET command to reset
> resolvers to default? Any one of below or both?
>
> 2a) reset all at once:
>  ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS
>
> 2b) reset one at a time:
>  ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type';
>
> The issue I see here is, to implement 1a and 1b, we have introduced
> the  'RESOLVER' keyword. If we want to implement 2a, we will have to
> introduce the 'RESOLVERS' keyword as well. But we can come up with
> some alternative syntax if we plan to implement these. Thoughts?
>


It makes sense to have a RESET on the lines of (a) and (b). At this
stage, we should do minimal in extending the syntax. How about RESET
CONFLICT RESOLVER ALL for (a)?

> ~~
>
> 3)  Regarding update_exists:
>
> 3a)
> Currently update_exists resolver patch is kept separate. The reason
> being, it performs resolution which will need deletion of multiple
> rows. It will be good to discuss if we want to target this in the
> first draft. Please see the example:
>
> create table tab (a int primary key, b int unique, c int unique);
>
> Pub: insert into tab  values (1,1,1);
> Sub:
> insert into tab  values (2,20,30);
> insert into tab values (3,40,50);
> insert into tab values (4,60,70);
>
> Pub: update tab set a=2,b=40,c=70 where a=1;
>
> The above 'update' on pub will result in 'update_exists' on sub and if
> resolution is in favour of 'apply', then it will conflict with all the
> three local rows of subscriber due to unique constraint present on all
> three columns. Thus in order to resolve the conflict, it will have to
> delete these 3 rows on sub:
>
> 2,20,30
> 3,40,50
> 4,60,70
> and then update 1,1,1 to 2,40,70.
>
> Just need opinion on if we shall target this in the initial draft.
>

This case looks a bit complicated. It seems there is no other
alternative than to delete the multiple rows. It is better to create a
separate top-up patch for this and we can discuss in detail about this
once the basic patch is in better shape.

> 3b)
> If we plan to implement this, we need to work on optimal design where
> we can find all the conflicting rows at once and delete those.
> Currently the implementation has been done using recursion i.e. find
> one conflicting row, then delete it and then next and so on i.e. we
> call  apply_handle_update_internal() recursively. On initial code
> review, I feel it is doable to scan all indexes at once and get
> conflicting-tuple-ids in one go and get rid of recursion. It can be
> attempted once we decide on 3a.
>

I suggest following the simplest strategy (even if that means calling
the update function recursively) by adding comments on the optimal
strategy. We can optimize it later as well.

> ~~
>
> 4)
> Now for insert_exists and update_exists, we are doing a pre-scan of
> all unique indexes to find conflict. Also there is post-scan to figure
> out if the conflicting row is inserted meanwhile. This needs to be
> reviewed for optimization. We need to avoid pre-scan wherever
> possible. I think the only case for which it can be avoided is
> 'ERROR'. For the cases where resolver is in favor of remote-apply, we
> need to check conflict beforehand to avoid rollback of already
> inserted data. And for the case where resolver is in favor of skipping
> the change, then too we should know beforehand about the conflict to
> avoid heap-insertion and rollback. Thoughts?
>

It makes sense to skip the pre-scan wherever possible. Your analysis
sounds reasonable to me.

> ~~
>
> 5)
> Currently we only capture update_missing conflict i.e. we are not
> distinguishing between the missing row and the deleted row. We had
> discussed this in the past a couple of times. If we plan to target it
> in draft 1, I can dig up all old emails and resume discussion on this.
>

This is a separate conflict detection project in itself. I am thinking
about the solution to this problem. We will talk about this in a
separate thread.

> ~~
>
> 6)
> Table-level resolves. There was a suggestion earlier to implement
> table-level resolvers. The patch has been implemented to some extent,
> it can be completed and posted when we are done reviewing subscription
> level resolvers.
>

Yeah, it makes sense to do it after the subscription-level resolution
patch is ready.

-- 
With Regards,
Amit Kapila.

Re: Conflict Detection and Resolution

Reply via email to