Hi, Andrey!

Thank you for working on this! There is a long history of the patch, I
hope it will be committed soon!)

On Fri, Jul 11, 2025 at 3:39 PM Andrey Borodin <x4...@yandex-team.ru> wrote:
>
>
>
> > On 30 Jun 2025, at 16:34, Andrey Borodin <x4...@yandex-team.ru> wrote:
> >
> > Please find attached two new steps for amcheck:
> > 1. A function to verify GiST integrity. This patch is in decent shape, 
> > simply rebased from previous year.
> > 2. Support on pg_amcheck's side for this function. This patch did not 
> > receive such review attention before. And, perhaps, should be extended to 
> > support existing GIN functions.
>
> Here's a version that adds GIN functions to pg_amcheck.
> IDK, maybe we should split pg_amcheck part into another thread and add there 
> BRIN too...
>

Speaking of BRIN pg_amcheck, I probably wouldn't merge it with the
gist/gin pg_amceck patchset because that would create a dependency on
the amcheck BRIN support patch, which is not clear when it will be
ready.


There are some points about the patch:

1) There are several typos in verify_gist.c:

 downlinks -> downlink (header comment)
 discrepencies -> discrepancies
 Correctess -> Correctness
 hande -> handle
 Initaliaze -> Initialize
 numbmer -> number
 replcaed -> replaced
 aquire -> aqcuire

2) Copyright year is 2023 in the patch. Time flies:)

3) There is the same check in btree and while reviewing the patch I
realised it should be added to the BRIN amcheck as well. Probably it
will be needed for GIN someday. What do you think about moving it to
verify_common?

    if (IsolationUsesXactSnapshot() && rel->rd_index->indcheckxmin &&
       
!TransactionIdPrecedes(HeapTupleHeaderGetXmin(rel->rd_indextuple->t_data),
                         result->snapshot->xmin))

4) Should we check blknum of the new entry before pushing to the
stack? Probably we can check if it's a valid blknum and it's not
outside of the index. This way we can give a more detailed error
message in case we meet the wrong blknum.

    in the split detection code:

        ptr->blkno = GistPageGetOpaque(page)->rightlink;

    and when we add children of an inner page:

        ptr->blkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid));

5) There is a macros for 0xffff - 'TUPLE_IS_VALID'. Maybe we can use
it to make the code more readable? Also the error message contains one
extra 'has'.

          if (off != 0xffff)
             ereport(ERROR,
                   (errcode(ERRCODE_INDEX_CORRUPTED),
                    errmsg("index \"%s\" has on page %u offset %u has
item id not pointing to 0xffff, but %hu",
                         RelationGetRelationName(check_state->rel),
stack->blkno, i, off)));

6) Several points about 'gistFormNormalizedTuple'. I read the previous
thread [1] and it seems there is an unfinished discussion about
normalizing during heapallindexed check. I think normalization needs
some more work here.

6a) There is a TODO
          /* pfree(DatumGetPointer(old)); // TODO: this fails. Why? */

6b) AFAICS 'compress' is an optional support function. If opclass
doesn't have a 'compress' function, then 'gistCompressValues' leaves
such attributes as it is. Here we get attdata from the heap scan, and
it could be toasted. That means that these checks can result in false
positives:

    gistCompressValues(giststate, r, attdata, isnull, true, compatt);
    ...

    for (int i = 0; i < r->rd_att->natts; i++)
    {
        if (VARATT_IS_EXTERNAL(DatumGetPointer(compatt[i])))
           ereport(ERROR,
                 (errcode(ERRCODE_INDEX_CORRUPTED),
                 ....
        if (VARATT_IS_COMPRESSED(DatumGetPointer(compatt[i])))
        {
           if (i < IndexRelationGetNumberOfKeyAttributes(r))
              ereport(ERROR,

    Also 'VARATT_IS_EXTERNAL' check will always result in a false
positive for toasted include attributes here. Reproducer for it:

    DROP TABLE IF EXISTS tbl;
    CREATE TABLE tbl(a point, t text);
    -- disable compression for 't', but let it to be external
    ALTER TABLE tbl ALTER COLUMN t SET STORAGE external ;
    INSERT INTO tbl  values (point(random(), random()), repeat('a',3000 ));
    CREATE INDEX tbl_idx ON tbl using gist (a) include (t);
    SELECT gist_index_check('tbl_idx', true);

    So I think we need to remove these checks completely.

6c) Current code doesn't apply normalization for existing index tuples
during adding to bloom filter, which can result in false positive,
reproducer:
    Here we use plain storage during index build, then during check we
have extended storage, which results in different binary
representation of the same data and we have false positive here.

    DROP TABLE IF EXISTS tbl;
    CREATE TABLE tbl(a tsvector);
    CREATE INDEX tbl_idx ON tbl using gist (a) ;
    ALTER TABLE tbl ALTER COLUMN a SET STORAGE plain;
    INSERT INTO tbl  values ('a' ::tsvector);
    ALTER TABLE tbl ALTER COLUMN a SET STORAGE extended ;
    SELECT gist_index_check('tbl_idx', true);

6d) In the end of 'gistFormNormalizedTuple' we have

        ItemPointerSetOffsetNumber(&(res->t_tid), 0xffff);

    I guess it follows gistFormTuple function, but here we use
gistFormNormalizedTuple only for leaf tuples and we override
offsetnumber right after 'gistFormNormalizedTuple' function call, so
looks like we can drop it.


    In general I think normalization here can follow the same logic as
for verify_nbtree. We can even reuse 'bt_normalize_tuple' as a
normalization function. It handles all corner cases like short varatt,
differences in compressions etc, that we can have in gist as well. It
contains just a few lines about btree and everything else valid for
gist, so we need to modify it a bit. I think we can move it to
verify_common. Then we need to normalize every existing leaf index
tuple before adding it to the bloom filter. During the probing phase I
think we just can use 'gistFormTuple' to build an index tuple and then
normalize it before probing. What do you think?

Thank you!

[1] 
https://www.postgresql.org/message-id/CAAhFRxiHCWe_6AmqGWZqYEkgN_uQG3Jgw0WgPw%2B0zO3_D-q4DA%40mail.gmail.com


Best regards,
Arseniy Mukhin


Reply via email to