GIN Index has O(N^2) complexity for array overlap operator?

Felix Geisendörfer Fri, 07 Sep 2018 13:48:01 -0700

Hi everybody,

I ran into an issue with using the && array operator on a GIN index of mine. 
Basically I have a query that looks like this:


SELECT * FROM example WHERE keys && ARRAY[...];

This works fine for a small number of array elements (N), but gets really slow 
as N gets bigger in what appears to be O(N^2) complexity.

However, from studying the GIN data structure as described by the docs, it 
seems that the performance for this could be O(N). In fact, it's possible to 
coerce the query planner into an O(N) plan like this:

SELECT DISTINCT ON (example.id) * FROM unnest(ARRAY[...]) key JOIN example ON 
keys && ARRAY[key]

In order to illustrate this better, I've created a jupyter notebook that 
populates an example table, show the query plans for both queries, and most 
importantly benchmarks them and plots a time vs array size (N) graph.

https://github.com/felixge/pg-slow-gin/blob/master/pg-slow-gin.ipynb 
<https://github.com/felixge/pg-slow-gin/blob/master/pg-slow-gin.ipynb>

Please help me understand what causes the O(N^2) performance for query 1 and if 
query 2 is the best way to work around this issue.

Thanks
Felix Geisendörfer

PS: I'm using Postgres 10, but also verified that this problem exists with 
Postgres 11.

GIN Index has O(N^2) complexity for array overlap operator?

Reply via email to