David Steele <da...@pgmasters.net> writes: > Tom, do the changes as enumerated in [1] look like they are going in the > right direction?
I spent a little time looking at this, and realized something that may or may not be a serious problem. This form of the patch supposes that it can use the usual tuple form/deform logic for all columns of a leaf tuple including the key column. However, that does not square with SPGIST's existing storage convention for pass-by-value key types: we presently assume that those are stored in their Datum representation, ie always 4 or 8 bytes depending on machine word width, even when typlen is less than that. Now there is an argument why this might not be an unacceptable disk format breakage: there probably aren't any SPGIST indexes with a pass-by-value leaf key type. We certainly haven't got any such opclasses in core, and it's a bit hard to see what the semantics or use-case would be for indexing bools or smallints with SPGIST. However, doing nothing isn't okay, because if anyone did make such an opclass in future, it'd misbehave with this patch (since SGLTDATUM would disagree with the actual storage layout). There are a number of options we could consider: 1. Disallow pass-by-value leafType, checking this in spgGetCache(). The main advantage of this IMV is that if anyone has written such an opclass already, it'd break noisily rather than silently misbehave. It'd also allow simplification of SGLTDATUM by removing its pass-by-value case, which is kind of nice. 2. Accept the potential format breakage, and keep the patch mostly as-is but adjust SGLTDATUM to do the right thing depending on typlen. 3. Decide that we need to preserve the existing rule. We could hackily still use the standard tuple form/deform logic if we told it that the datatype of a pass-by-value key column is INT4 or INT8, depending on sizeof(Datum). But that could be rather messy. Another thing I notice in this immediate area is that the code presently assumes it can apply SGLTDATUM even to leaf tuples that store a null key. That's perfectly okay for pass-by-ref key types, since it just means we compute an address we're not going to dereference. But it's really rather broken for pass-by-value cases: it'll fetch a word from past the end of the tuple. Given recent musings about making the holes in the middle of pages undefined per valgrind, I wonder whether we aren't going to be forced to clean that up. Choice #1 looks a little more attractive with that in mind: it'd mean there's nothing to fix. A couple of other observations: * Making doPickSplit deform all the tuples at once, and thereby need fairly large work arrays (which it leaks), seems kind of ugly. Couldn't we leave the deforming to the end, and do it one tuple at a time just as we form the derived tuples? (Then you could use fixed-size local arrays of length INDEX_MAX_KEYS.) Could probably remove the heapPtrs[] array that way, too. * Personally I would not have changed the API of spgFormLeafTuple to take only the TupleDesc and not the whole SpGistState. That doesn't seem to buy anything, and we'd have to undo it in future if spgFormLeafTuple ever needs access to any of the rest of the SpGistState. * The amount of random whitespace changes in the patch is really rather annoying. Please run the code through pgindent to undo unnecessary changes to existing code lines, and also look through it to remove unnecessary additions and removals of blank lines. regards, tom lane