On Wed, Dec 10, 2014 at 1:50 PM, Heikki Linnakangas <hlinnakan...@vmware.com > wrote:
> On 01/28/2014 04:12 PM, Alexander Korotkov wrote: > >> >3. A binary heap would be a better data structure to buffer the rechecked >>> >values. A Red-Black tree allows random insertions and deletions, but in >>> >this case you need to insert arbitrary values but only remove the >>> minimum >>> >item. That's exactly what a binary heap excels at. We have a nice binary >>> >heap implementation in the backend that you can use, see >>> >src/backend/lib/binaryheap.c. >>> > >>> >> Hmm. For me binary heap would be a better data structure for KNN-GiST at >> all :-) >> > > I decided to give this a shot, replacing the red-black tree in GiST with > the binary heap we have in lib/binaryheap.c. It made the GiST code somewhat > simpler, as the binaryheap interface is simpler than the red-black tree > one. Unfortunately, performance was somewhat worse. That was quite > surprising, as insertions and deletions are both O(log N) in both data > structures, but the red-black tree implementation is more complicated. > > I implemented another data structure called a Pairing Heap. It's also a > fairly simple data structure, but insertions are O(1) instead of O(log N). > It also performs fairly well in practice. > > With that, I got a small but measurable improvement. To test, I created a > table like this: > > create table gisttest (id integer, p point); > insert into gisttest select id, point(random(), random()) from > generate_series(1, 1000000) id; > create index i_gisttest on gisttest using gist (p); > > And I ran this query with pgbench: > > select id from gisttest order by p <-> '(0,0)' limit 1000; > > With unpatched master, I got about 650 TPS, and with the patch 720 TPS. > That's a nice little improvement, but perhaps more importantly, the pairing > heap implementation consumes less memory. To measure that, I put a > MemoryContextStats(so->queueCtx) call into gistendscan. With the above > query, but without the "limit" clause, on master I got: > > GiST scan context: 2109752 total in 10 blocks; 2088456 free (24998 > chunks); 21296 used > > And with the patch: > > GiST scan context: 1061160 total in 9 blocks; 1040088 free (12502 chunks); > 21072 used > > That's 2MB vs 1MB. While that's not much in absolute terms, it'd be nice > to reduce that memory consumption, as there is no hard upper bound on how > much might be needed. If the GiST tree is really disorganized for some > reason, a query might need a lot more. > > > So all in all, I quite like this patch, even though it doesn't do anything > too phenomenal. It adds a some code, in the form of the new pairing heap > implementation, but it makes the GiST code a little bit simpler. And it > gives a small performance gain, and reduces memory usage a bit. > > - Heikki > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > It may be better to replace the lib/binaryheap altogether as it offers comparable/better performance.