Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Amit Kapila Mon, 18 Jul 2022 01:52:00 -0700

On Mon, Jul 18, 2022 at 11:59 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Tue, Jul 12, 2022 at 7:07 PM Önder Kalacı <onderkal...@gmail.com> wrote:
> >
> > Hi hackers,
> >
> >
> > It is often not feasible to use `REPLICA IDENTITY FULL` on the publication, 
> > because it leads to full table scan
> >
> > per tuple change on the subscription. This makes `REPLICA IDENTITY FULL` 
> > impracticable -- probably other
> >
> > than some small number of use cases.
> >
>
> IIUC, this proposal is to optimize cases where users can't have a
> unique/primary key for a relation on the subscriber and those
> relations receive lots of updates or deletes?
>
> > With this patch, I'm proposing the following change: If there is an index 
> > on the subscriber, use the index
> >
> > as long as the planner sub-modules picks any index over sequential scan.
> >
> > Majority of the logic on the subscriber side has already existed in the 
> > code. The subscriber is already
> >
> > capable of doing (unique) index scans. With this patch, we are allowing the 
> > index to iterate over the
> >
> > tuples fetched and only act when tuples are equal. The ones familiar with 
> > this part of the code could
> >
> > realize that the sequential scan code on the subscriber already implements 
> > the `tuples_equal()` function.
> >
> > In short, the changes on the subscriber are mostly combining parts of 
> > (unique) index scan and
> >
> > sequential scan codes.
> >
> > The decision on whether to use an index (or which index) is mostly derived 
> > from planner infrastructure.
> >
> > The idea is that on the subscriber we have all the columns. So, construct 
> > all the `Path`s with the
> >
> > restrictions on all columns, such as `col_1 = $1 AND col_2 = $2 ... AND 
> > col_n = $N`. Finally, let
> >
> > the planner sub-module -- `create_index_paths()` -- to give us the relevant 
> >  index `Path`s. On top of
> >
> > that adds the sequential scan `Path` as well. Finally, pick the cheapest 
> > `Path` among.
> >
> > From the performance point of view, there are few things to note. First, 
> > the patch aims not to
> > change the behavior when PRIMARY KEY or UNIQUE INDEX is used. Second, when 
> > REPLICA IDENTITY
> > IS FULL on the publisher and an index is used on the subscriber, the 
> > difference mostly comes down
> > to `index scan` vs `sequential scan`. That's why it is hard to claim a 
> > certain number of improvements.
> > It mostly depends on the data size, index and the data distribution.
> >
>
> It seems that in favorable cases it will improve performance but we
> should consider unfavorable cases as well. Two things that come to
> mind in that regard are (a) while choosing index/seq. scan paths, the
> patch doesn't account for cost for tuples_equal() which needs to be
> performed for index scans, (b) it appears to me that the patch decides
> which index to use the first time it opens the rel (or if the rel gets
> invalidated) on subscriber and then for all consecutive operations it
> uses the same index. It is quite possible that after some more
> operations on the table, using the same index will actually be
> costlier than a sequence scan or some other index scan.
>


Point (a) won't matter because we perform tuples_equal both for
sequence and index scans. So, we can ignore point (a).

-- 
With Regards,
Amit Kapila.

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Reply via email to