Re: Do we want a hashset type?

Joel Jacobson Fri, 09 Jun 2023 03:59:00 -0700

On Thu, Jun 8, 2023, at 12:19, Tomas Vondra wrote:
> Would you be interested in helping with / working on some of that? I
> don't have immediate need for this stuff, so it's not very high on my
> TODO list.


Sure, I'm willing to help!

I've attached a patch that works on some of the items on your list,
including some additions to the README.md.

There were a bunch of places where `maxelements / 8` caused bugs,
that had to be changed to do proper integer ceiling division:

-       values = (int32 *) (set->data + set->maxelements / 8);
+       values = (int32 *) (set->data + (set->maxelements + 7) / 8);

Side note: I wonder if it would be good to add CEIL_DIV and FLOOR_DIV macros
to the PostgreSQL source code in general, since it's easy to make this mistake,
and quite verbose/error-prone to write it out manually everywhere.
Such macros could simplify code in e.g. numeric.c.

> There's a bunch of stuff that needs to be improved to make this properly
> usable, like:
>
> 1) better hash table implementation
TODO

> 2) input/output functions
I've attempted to implement these.
I thought comma separated values wrapped around curly braces felt as the most 
natural format,
example:
SELECT '{1,2,3}'::hashset;

> 3) support for other types (now it only works with int32)
TODO

> 4) I wonder if this might be done as an array-like polymorphic type.
That would be nice!
I guess the work-around would be to store the actual value of non-int type
in a lookup table, and then hash the int-based primary key in such table.

Do you think later implementing polymorphic type support would
mean a more or less complete rewrite, or can we carry on with int32-support
and add it later on?

> 5) more efficient storage format, with versioning etc.
TODO

> 6) regression tests
I've added some regression tests.

> Right. IMHO the query language is a separate thing, you still need to
> evaluate the query somehow - which is where hashset applies.

Good point, I fully agree.

/Joel

hashset-1.0.0-joel-0001.patch
Description: Binary data

Re: Do we want a hashset type?

Reply via email to