Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

Amit Kapila Tue, 04 Jul 2023 22:47:07 -0700

On Wed, Jul 5, 2023 at 9:01 AM Peter Smith <smithpb2...@gmail.com> wrote:
>
> Hi. Here are some review comments for this patch.
>
> +1 for the patch idea.
>
> ------
>
> I wasn't sure about the code comment adjustments suggested by Amit [1]:
> "Accordingly, the comments atop build_replindex_scan_key(),
> FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
> should also be adjusted."
>
> Actually, I thought the FindUsableIndexForReplicaIdentityFull()
> function comment is *already* describing the limitation about the
> leftmost column (see fragment below), so IIUC the Sawada-san patch is
> only trying to expose that same information in the PG docs.
>
> [FindUsableIndexForReplicaIdentityFull comment fragment]
>  * We also skip indexes if the remote relation does not contain the leftmost
>  * column of the index. This is because in most such cases sequential scan is
>  * favorable over index scan.
>


This implies that the leftmost column of the index must be
non-expression but I feel what the patch intends to say in docs is
more straightforward and it doesn't match what the proposed docs says.

> ~
>
> OTOH, it may be better if these limitation rule details were not
> scattered in the code. e.g. build_replindex_scan_key() function
> comment can be simplified:
>
> CURRENT:
>  * This is not generic routine, it expects the idxrel to be a btree, 
> non-partial
>  * and have at least one column reference (i.e. cannot consist of only
>  * expressions).
>
> SUGGESTION:
> This is not a generic routine. It expects the 'idxrel' to be an index
> deemed "usable" by the function
> FindUsableIndexForReplicaIdentityFull().
>

Note that for PK/ReplicaIdentity, we don't even call
FindUsableIndexForReplicaIdentityFull() but build_replindex_scan_key()
would still be called for such index. So, I am not sure your proposed
wording is an improvement.

> ------
> doc/src/sgml/logical-replication.sgml
>
> 1.
>     the key.  When replica identity <literal>FULL</literal> is specified,
>     indexes can be used on the subscriber side for searching the rows.
> Candidate
>     indexes must be btree, non-partial, and have at least one column reference
> -   (i.e. cannot consist of only expressions).  These restrictions
> -   on the non-unique index properties adhere to some of the restrictions that
> -   are enforced for primary keys.  If there are no such suitable indexes,
> +   at the leftmost column indexes (i.e. cannot consist of only
> expressions).  These
> +   restrictions on the non-unique index properties adhere to some of
> the restrictions
> +   that are enforced for primary keys.  If there are no such suitable 
> indexes,
>     the search on the subscriber side can be very inefficient, therefore
>     replica identity <literal>FULL</literal> should only be used as a
>     fallback if no other solution is possible.  If a replica identity other
>
> Isn't this using the word "indexes" with different meanings in the
> same sentence? e.g. IIUC "leftmost column indexes" is referring to the
> ordinal number of the index fields. TBH, I am not sure the patch
> wording is even describing the limitation in quite the same way as
> what the code is actually doing.
>
> HEAD (code comment):
>  * We also skip indexes if the remote relation does not contain the leftmost
>  * column of the index. This is because in most such cases sequential scan is
>  * favorable over index scan.
>
> HEAD (rendered docs)
> Candidate indexes must be btree, non-partial, and have at least one
> column reference (i.e. cannot consist of only expressions). These
> restrictions on the non-unique index properties adhere to some of the
> restrictions that are enforced for primary keys.
>
> PATCHED (rendered docs)
> Candidate indexes must be btree, non-partial, and have at least one
> column reference at the leftmost column indexes (i.e. cannot consist
> of only expressions). These restrictions on the non-unique index
> properties adhere to some of the restrictions that are enforced for
> primary keys.
>
> MY SUGGESTION:
> Candidate indexes must be btree, non-partial, and have at least one
> column reference (i.e. cannot consist of only expressions).
> Furthermore, the leftmost field of the candidate index must be a
> column of the published table. These restrictions on the non-unique
> index properties adhere to some of the restrictions that are enforced
> for primary keys.
>

I don't know if this suggestion is what the code is actually doing. In
function RemoteRelContainsLeftMostColumnOnIdx(), we have the following
checks:
==========
keycol = indexInfo->ii_IndexAttrNumbers[0];
if (!AttributeNumberIsValid(keycol))
return false;

if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol))
return false;

return attrmap->attnums[AttrNumberGetAttrOffset(keycol)] >= 0;
==========

The first of these checks indicates that the leftmost column of the
index should be non-expression, second and third indicates what you
suggest in your wording. We can also think that what you wrote in a
way is a superset of "leftmost index column is a non-expression" and
"leftmost index column should be present in remote rel" but I feel it
would be better to explicit about the first part as it is easy to
understand for users at least in docs.

-- 
With Regards,
Amit Kapila.

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

Reply via email to