On Fri, 29 Jul 2022 at 13:49, Simon Riggs <simon.ri...@enterprisedb.com> wrote: > > On Thu, 28 Jul 2022 at 19:50, Tom Lane <t...@sss.pgh.pa.us> wrote: > > > > Simon Riggs <simon.ri...@enterprisedb.com> writes: > > > Thanks for the nudge. New version attached. > > > > I also see a speed improvement from this
> > --- > > DROP TABLE IF EXISTS hash_speed; > > CREATE unlogged TABLE hash_speed (x integer); > > INSERT INTO hash_speed SELECT random()*10000000 FROM > > generate_series(1,10000000) x; > > vacuum hash_speed; > > \timing on > > CREATE INDEX ON hash_speed USING hash (x); > > --- > > Also, it seems like we've left some money on the table by not > > exploiting downstream the knowledge that this sorting happened. > > During an index build, it's no longer necessary for > > _hash_pgaddtup to do _hash_binsearch, and therefore also not > > _hash_get_indextuple_hashkey: we could just always append the new > > tuple at the end. Perhaps checking it against the last existing > > tuple is worth the trouble as a bug guard, but for sure we don't > > need the log2(N) comparisons that _hash_binsearch will do. > > Hmm, I had that in an earlier version of the patch, not sure why it > dropped out since I wrote it last year, but then I've got lots of > future WIP patches in the area of hash indexes. ... > > At this point the cfbot will start to bleat that the patch of > > record doesn't apply, so I'm going to mark the CF entry committed. > > If anyone wants to produce a follow-on patch, please make a > > new entry. > > Will do. Thanks. Using the above test case, I'm getting a further 4-7% improvement on already committed code with the attached patch, which follows your proposal. The patch passes info via a state object, useful to avoid API churn in later patches. Adding to CFapp again. -- Simon Riggs http://www.EnterpriseDB.com/
hash_inserted_sorted.v2.patch
Description: Binary data