Re: Next Steps with Hash Indexes

Simon Riggs Thu, 14 Oct 2021 00:48:35 -0700

On Wed, 13 Oct 2021 at 20:16, Peter Geoghegan <[email protected]> wrote:
>
> On Wed, Oct 13, 2021 at 3:44 AM Simon Riggs
> <[email protected]> wrote:
> > > IMO it'd be nice to show some numbers to support the claims that storing
> > > the extra hashes and/or 8B hashes is not worth it ...
> >
> > Using an 8-byte hash is possible, but only becomes effective when
> > 4-byte hash collisions get hard to manage. 8-byte hash also makes the
> > index 20% bigger, so it is not a good default.
>
> Are you sure? I know that nbtree index tuples for a single-column int8
> index are exactly the same size as those from a single column int4
> index, due to alignment overhead at the tuple level. So my guess is
> that hash index tuples (which use the same basic IndexTuple
> representation) work in the same way.


The hash index tuples are 20-bytes each. If that were rounded up to
8-byte alignment, then that would be 24 bytes.

Using pageinspect, the max(live_items) on any data page (bucket or
overflow) is 407 items, so they can't be 24 bytes long.


Other stats of interest would be that the current bucket design/page
splitting is very effective at maintaining distribution. On a hash
index for a table with 2 billion rows in it, with integer values from
1 to 2billion, there are 3670016 bucket pages and 524286 overflow
pages, distributed so that 87.5% of buckets have no overflow pages,
and 12.5% of buckets have only one overflow page; there are no buckets
with >1 overflow page. The most heavily populated overflow page has
209 items.

The CREATE INDEX time is fairly poor at present, but that can be
optimized easily enough, but I expect to do that after uniqueness is
added, since it would complicate the code to do that work in a
different order.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Next Steps with Hash Indexes

Reply via email to