Re: [PATCH] Covering SPGiST index

Tom Lane Thu, 11 Mar 2021 14:18:54 -0800

David Steele <da...@pgmasters.net> writes:
> Tom, do the changes as enumerated in [1] look like they are going in the 
> right direction?


I spent a little time looking at this, and realized something that may
or may not be a serious problem.  This form of the patch supposes that
it can use the usual tuple form/deform logic for all columns of a leaf
tuple including the key column.  However, that does not square with
SPGIST's existing storage convention for pass-by-value key types: we
presently assume that those are stored in their Datum representation,
ie always 4 or 8 bytes depending on machine word width, even when
typlen is less than that.

Now there is an argument why this might not be an unacceptable disk
format breakage: there probably aren't any SPGIST indexes with a
pass-by-value leaf key type.  We certainly haven't got any such
opclasses in core, and it's a bit hard to see what the semantics or
use-case would be for indexing bools or smallints with SPGIST.
However, doing nothing isn't okay, because if anyone did make such
an opclass in future, it'd misbehave with this patch (since SGLTDATUM
would disagree with the actual storage layout).

There are a number of options we could consider:

1. Disallow pass-by-value leafType, checking this in spgGetCache().
The main advantage of this IMV is that if anyone has written such an
opclass already, it'd break noisily rather than silently misbehave.
It'd also allow simplification of SGLTDATUM by removing its
pass-by-value case, which is kind of nice.

2. Accept the potential format breakage, and keep the patch mostly
as-is but adjust SGLTDATUM to do the right thing depending on typlen.

3. Decide that we need to preserve the existing rule.  We could hackily
still use the standard tuple form/deform logic if we told it that the
datatype of a pass-by-value key column is INT4 or INT8, depending on
sizeof(Datum).  But that could be rather messy.

Another thing I notice in this immediate area is that the code
presently assumes it can apply SGLTDATUM even to leaf tuples that
store a null key.  That's perfectly okay for pass-by-ref key types,
since it just means we compute an address we're not going to
dereference.  But it's really rather broken for pass-by-value cases:
it'll fetch a word from past the end of the tuple.  Given recent
musings about making the holes in the middle of pages undefined per
valgrind, I wonder whether we aren't going to be forced to clean that
up.  Choice #1 looks a little more attractive with that in mind: it'd
mean there's nothing to fix.

A couple of other observations:

* Making doPickSplit deform all the tuples at once, and thereby need
fairly large work arrays (which it leaks), seems kind of ugly.
Couldn't we leave the deforming to the end, and do it one tuple at
a time just as we form the derived tuples?  (Then you could use
fixed-size local arrays of length INDEX_MAX_KEYS.)  Could probably
remove the heapPtrs[] array that way, too.

* Personally I would not have changed the API of spgFormLeafTuple
to take only the TupleDesc and not the whole SpGistState.  That
doesn't seem to buy anything, and we'd have to undo it in future
if spgFormLeafTuple ever needs access to any of the rest of the
SpGistState.

* The amount of random whitespace changes in the patch is really
rather annoying.  Please run the code through pgindent to undo
unnecessary changes to existing code lines, and also look through
it to remove unnecessary additions and removals of blank lines.

                        regards, tom lane

Re: [PATCH] Covering SPGiST index

Reply via email to