Re: Do we want a hashset type?

Tomas Vondra Thu, 08 Jun 2023 03:19:23 -0700

On 6/8/23 11:41, Joel Jacobson wrote:
> On Wed, Jun 7, 2023, at 19:37, Tomas Vondra wrote:
>> Interesting, considering how dumb the the hash table implementation is.
> 
> That's promising.
>


Yeah, not bad for sleep-deprived on-plane hacking ...

There's a bunch of stuff that needs to be improved to make this properly
usable, like:

1) better hash table implementation

2) input/output functions

3) support for other types (now it only works with int32)

4) I wonder if this might be done as an array-like polymorphic type.

5) more efficient storage format, with versioning etc.

6) regression tests

Would you be interested in helping with / working on some of that? I
don't have immediate need for this stuff, so it's not very high on my
TODO list.

>>> I tested Neo4j and the results are surprising; it appears to be 
>>> significantly *slower*.
>>> However, I've probably misunderstood something, maybe I need to add some 
>>> index or something.
>>> Even so, it's interesting it's apparently not fast "by default".
>>>
>>
>> No idea how to fix that, but it's rather suspicious.
> 
> I've had a graph-db expert review my benchmark, and he suggested adding an 
> index:
> 
> CREATE INDEX FOR (n:User) ON (n.id)
> 
> This did improve the execution time for Neo4j a bit, down from 819 ms to 528 
> ms, but PostgreSQL 299 ms is still a win.
> 
> Benchmark here: https://github.com/joelonsql/graph-query-benchmarks
> Note, in this benchmark, I only test the naive RECURSIVE CTE approach using 
> array_agg(DISTINCT ...).
> And I couldn't even test the most connected user with Neo4j, the query never 
> finish for some reason,
> so I had to test with a less connected user.
> 

Interesting. I'd have expected the graph db to be much faster.

> The graph expert also said that other more realistic graph use-cases might be 
> "multi-relational",
> and pointed me to a link: 
> https://github.com/totogo/awesome-knowledge-graph#knowledge-graph-dataset
> No idea how such multi-relational datasets would affect the benchmarks.
> 

Not sure either, but I don't have ambition to improve everything at
once. If the hashset improves one practical use case, fine with me.

> I think we have already strong indicators that PostgreSQL with a hashset type 
> will from a relative
> performance perspective, do just fine processing basic graph queries, even 
> with large datasets.
> 
> Then there will always be the case when users primarily write very different 
> graph queries all day long,
> who might prefer a graph query *language*, like SQL/PGQ in SQL:2023, Cypher 
> or Gremlin.
> 

Right. IMHO the query language is a separate thing, you still need to
evaluate the query somehow - which is where hashset applies.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Do we want a hashset type?

Reply via email to