Hi,
On Wed, Sep 9, 2020 at 9:43 AM Andrey M. Borodin <x4...@yandex-team.ru> wrote: > > > > 9 сент. 2020 г., в 00:05, Heikki Linnakangas <hlinn...@iki.fi> > написал(а): > > > > I've been reviewing the patch today. The biggest changes I've made have > been in restructuring the code in gistbuild.c for readability, but there > are a bunch of smaller changes throughout. Attached is what I've got so > far, squashed into one patch. > Thanks! > > > I'm continuing to review it, but a couple of questions so far: > > > > In the gistBuildCallback(), you're skipping the tuple if 'tupleIsAlive > == false'. That seems fishy, surely we need to index recently-dead tuples, > too. The normal index build path isn't skipping them either. > That's an oversight. > > > > How does the 'sortsupport' routine interact with > 'compress'/'decompress'? Which representation is passed to the comparator > routine: the original value from the table, the compressed representation, > or the decompressed representation? Do the comparetup_index_btree() and > readtup_index() routines agree with that? > > Currently we pass compressed values, which seems not very good. > But there was a request from PostGIS maintainers to pass values before > decompression. > Darafei, please, correct me if I'm wrong. Also can you please provide link > on PostGIS B-tree sorting functions? > We were expecting to reuse btree opclass for this thing. This way btree_gist extension will become a lot thinner. :) Core routine for current sorting implementation is Hilbert curve, which is based on 2D center of a box - and used for abbreviated sort: https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gbox.c#L893 All the btree functions are wrappers around gserialized_cmp which just adds a bunch of tiebreakers that don't matter in practice: https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gserialized.c#L313 Base representation for index compressed datatype is GIDX, which is also a box. We can make it work on top of it instead of the original representation. There is no such thing as "decompressed representation" unfortunately as compression is lossy.