Re: Next Steps with Hash Indexes

Simon Riggs Tue, 05 Oct 2021 03:38:29 -0700

On Mon, 27 Sept 2021 at 06:52, Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Thu, Sep 23, 2021 at 11:11 AM Dilip Kumar <dilipbal...@gmail.com> wrote:
> >
> > On Thu, Sep 23, 2021 at 10:04 AM Sadhuprasad Patro <b.sa...@gmail.com> 
> > wrote:
> > >
> > > And to get the multi-column hash index selected, we may set
> > > enable_hashjoin =off, to avoid any condition become join condition,
> > > saw similar behaviors in other DBs as well...
> >
> > This may be related to Tom's point that, if some of the quals are
> > removed due to optimization or converted to join quals, then now, even
> > if the user has given qual on all the key columns the index scan will
> > not be selected because we will be forcing that the hash index can
> > only be selected if it has quals on all the key attributes?
> >
> > I don't think suggesting enable_hashjoin =off is a solution,
> >
>
> Yeah, this doesn't sound like a good idea. How about instead try to
> explore the idea where the hash (bucket assignment and search) will be
> based on the first index key and the other columns will be stored as
> payload? I think this might pose some difficulty in the consecutive
> patch to enable a unique index because it will increase the chance of
> traversing more buckets for uniqueness checks. If we see such
> problems, then I have another idea to minimize the number of buckets
> that we need to lock during uniqueness check which is by lock chaining
> as is used during hash bucket clean up where at a time we don't need
> to lock more than two buckets at a time.


I have presented a simple, almost trivial, patch to allow multi-col
hash indexes. It hashes the first column only, which can be a downside
in *some* cases. If that is clearly documented, it would not cause
many issues, IMHO. However, it does not have any optimization issues
or complexities, which is surely a very good thing.

Trying to involve *all* columns in the hash index is a secondary
optimization. It requires subtle changes in optimizer code, as Tom
points out. It also needs fine tuning to make the all-column approach
beneficial for the additional cases without losing against what the
"first column" approach gives.

I did consider both approaches and after this discussion I am still in
favour of committing the very simple "first column" approach to
multi-col hash indexes now.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Next Steps with Hash Indexes

Reply via email to