On Mon, Jul 18, 2022 at 11:59 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Tue, Jul 12, 2022 at 7:07 PM Önder Kalacı <onderkal...@gmail.com> wrote: > > > > Hi hackers, > > > > > > It is often not feasible to use `REPLICA IDENTITY FULL` on the publication, > > because it leads to full table scan > > > > per tuple change on the subscription. This makes `REPLICA IDENTITY FULL` > > impracticable -- probably other > > > > than some small number of use cases. > > > > IIUC, this proposal is to optimize cases where users can't have a > unique/primary key for a relation on the subscriber and those > relations receive lots of updates or deletes? > > > With this patch, I'm proposing the following change: If there is an index > > on the subscriber, use the index > > > > as long as the planner sub-modules picks any index over sequential scan. > > > > Majority of the logic on the subscriber side has already existed in the > > code. The subscriber is already > > > > capable of doing (unique) index scans. With this patch, we are allowing the > > index to iterate over the > > > > tuples fetched and only act when tuples are equal. The ones familiar with > > this part of the code could > > > > realize that the sequential scan code on the subscriber already implements > > the `tuples_equal()` function. > > > > In short, the changes on the subscriber are mostly combining parts of > > (unique) index scan and > > > > sequential scan codes. > > > > The decision on whether to use an index (or which index) is mostly derived > > from planner infrastructure. > > > > The idea is that on the subscriber we have all the columns. So, construct > > all the `Path`s with the > > > > restrictions on all columns, such as `col_1 = $1 AND col_2 = $2 ... AND > > col_n = $N`. Finally, let > > > > the planner sub-module -- `create_index_paths()` -- to give us the relevant > > index `Path`s. On top of > > > > that adds the sequential scan `Path` as well. Finally, pick the cheapest > > `Path` among. > > > > From the performance point of view, there are few things to note. First, > > the patch aims not to > > change the behavior when PRIMARY KEY or UNIQUE INDEX is used. Second, when > > REPLICA IDENTITY > > IS FULL on the publisher and an index is used on the subscriber, the > > difference mostly comes down > > to `index scan` vs `sequential scan`. That's why it is hard to claim a > > certain number of improvements. > > It mostly depends on the data size, index and the data distribution. > > > > It seems that in favorable cases it will improve performance but we > should consider unfavorable cases as well. Two things that come to > mind in that regard are (a) while choosing index/seq. scan paths, the > patch doesn't account for cost for tuples_equal() which needs to be > performed for index scans, (b) it appears to me that the patch decides > which index to use the first time it opens the rel (or if the rel gets > invalidated) on subscriber and then for all consecutive operations it > uses the same index. It is quite possible that after some more > operations on the table, using the same index will actually be > costlier than a sequence scan or some other index scan. >
Point (a) won't matter because we perform tuples_equal both for sequence and index scans. So, we can ignore point (a). -- With Regards, Amit Kapila.