Re: Yet another fast GiST build

Komяpa Wed, 09 Sep 2020 00:06:12 -0700

Hi,


On Wed, Sep 9, 2020 at 9:43 AM Andrey M. Borodin <[email protected]>
wrote:

>
>
> > 9 сент. 2020 г., в 00:05, Heikki Linnakangas <[email protected]>
> написал(а):
> >
> > I've been reviewing the patch today. The biggest changes I've made have
> been in restructuring the code in gistbuild.c for readability, but there
> are a bunch of smaller changes throughout. Attached is what I've got so
> far, squashed into one patch.
> Thanks!
>
> > I'm continuing to review it, but a couple of questions so far:
> >
> > In the gistBuildCallback(), you're skipping the tuple if 'tupleIsAlive
> == false'. That seems fishy, surely we need to index recently-dead tuples,
> too. The normal index build path isn't skipping them either.
> That's an oversight.
> >
> > How does the 'sortsupport' routine interact with
> 'compress'/'decompress'? Which representation is passed to the comparator
> routine: the original value from the table, the compressed representation,
> or the decompressed representation? Do the comparetup_index_btree() and
> readtup_index() routines agree with that?
>
> Currently we pass compressed values, which seems not very good.
> But there was a request from PostGIS maintainers to pass values before
> decompression.
> Darafei, please, correct me if I'm wrong. Also can you please provide link
> on PostGIS B-tree sorting functions?
>

We were expecting to reuse btree opclass for this thing. This way
btree_gist extension will become a lot thinner. :)

Core routine for current sorting implementation is Hilbert curve, which is
based on 2D center of a box - and used for abbreviated sort:
https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gbox.c#L893


All the btree functions are wrappers around gserialized_cmp which just adds
a bunch of tiebreakers that don't matter in practice:
https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gserialized.c#L313

Base representation for index compressed datatype is GIDX, which is also a
box. We can make it work on top of it instead of the original
representation.
There is no such thing as "decompressed representation" unfortunately as
compression is lossy.

Re: Yet another fast GiST build

Reply via email to