Hello Jeff, Sadly, the query you suggested won't work because you are only returning the first row of the matching inner query rows. Example:
SELECT > u.idx, > u.page_idxs > FROM > ( > VALUES > (1, ARRAY[11, 21, 31]), > (2, ARRAY[12, 21, 32]), > (3, ARRAY[13, 23, 31]) > ) > AS u(idx, page_idxs) > WHERE > u.page_idxs && ARRAY[( > SELECT > p.idx > FROM > ( > VALUES > (11, ARRAY[101, 201, 301]), > (21, ARRAY[102, 201, 302]), > (13, ARRAY[103, 203, 301]) > ) > AS p(idx, attribute_idxs) > WHERE > p.attribute_idxs && ARRAY[201] > FETCH FIRST 1 ROWS ONLY > )] > ; This query only returns one row while it should actually return two: 1 {11,21,31} The INNER JOIN version of the query will return all matching rows but also include duplicates: SELECT > u.idx, > u.page_idxs > FROM > ( > VALUES > (1, ARRAY[11, 21, 31]), > (2, ARRAY[12, 21, 32]), > (3, ARRAY[13, 23, 31]) > ) > AS u(idx, page_idxs) > INNER JOIN > ( > SELECT > p.idx > FROM > ( > VALUES > (11, ARRAY[101, 201, 301]), > (21, ARRAY[102, 201, 302]), > (13, ARRAY[103, 203, 301]) > ) > AS p(idx, attribute_idxs) > WHERE > p.attribute_idxs && ARRAY[201] > ) > AS p2 > ON u.page_idxs && ARRAY[p2.idx] > ; Results: 1 {11,21,31} > 1 {11,21,31} > 2 {12,21,32} As far as I know, the the IN + sub-expression query can't work since the left side of the operation is an array of integers and the right side a set of rows with a single integer column. The reason I'm using integer arrays is because it is the only way I have found in PostgreSQL to get fast inclusion / exclusion checks on large datasets (hundreds of millions of values). Did I misunderstand your response? Thank you for the ongoing help, Mickael On Mon, May 23, 2022 at 4:45 AM Jeff Janes <jeff.ja...@gmail.com> wrote: > > > On Fri, May 20, 2022 at 6:42 AM Mickael van der Beek < > mickael.van.der.b...@gmail.com> wrote: > >> >> Query: >> >> EXPLAIN ( >>> ANALYZE, >>> VERBOSE, >>> COSTS, >>> BUFFERS, >>> TIMING >>> ) >>> SELECT >>> fu.w2_page_idxs >>> FROM >>> fact_users >>> AS fu >>> WHERE >>> EXISTS ( >>> SELECT >>> FROM >>> ( >>> SELECT >>> ARRAY[idx] AS page_idx >>> FROM >>> fact_pages >>> WHERE >>> attribute_idxs && ARRAY[300000160] >>> FETCH FIRST 1 ROWS ONLY >>> ) >>> AS fp >>> WHERE >>> fu.w2_page_idxs && fp.page_idx >>> ) >>> ; >> >> >> Without any surprises, the planner is using a sequential scan on the >> "fact_users" table which is very large instead of using the GIN index set >> on the "w2_page_idxs" column. >> > > For me, using the subquery in and expression, instead of the EXISTS, does > get it to use the gin index. And I think it must give the same results. > > SELECT > fu.w2_page_idxs > FROM fact_users AS fu > WHERE > fu.w2_page_idxs && ARRAY[(select idx from fact_pages where > attribute_idxs && ARRAY[3003] FETCH FIRST 1 ROWS ONLY)]; > > But why are you using intarray? That is unnecessary here, and by creating > ambiguity about the array operators it might be harmful. > > Cheers, > > Jeff > >> -- Mickael van der BeekWeb developer & Security analyst mickael.van.der.b...@gmail.com