The optimizer was a bit too clever. It used the same plan for the LEFT
JOIN. But that put me on the right track. I tried a LATERAL join. But the
optimizer saw through that too and used the same plan. So I tried a
materialized CTE and that finally forced it to use a different plan. That
made it  run in ~70ms -- about 18x faster. Thanks!

explain analyze
with r as materialized (
  select * from matching_rules
  where id >= 0 and id < 60
)
select r.id, i.id
from r
  join items i on i.name ~ r.name_matches
;

                                                     QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Nested Loop  (cost=2.78..714.20 rows=230 width=8) (actual
time=0.071..69.545 rows=702 loops=1)
   Join Filter: (i.name ~ r.name_matches)
   Rows Removed by Join Filter: 45298
   CTE r
     ->  Seq Scan on matching_rules  (cost=0.00..2.78 rows=46 width=26)
(actual time=0.007..0.047 rows=46 loops=1)
           Filter: ((id >= 0) AND (id < 60))
           Rows Removed by Filter: 6
   ->  CTE Scan on r  (cost=0.00..0.92 rows=46 width=36) (actual
time=0.008..0.090 rows=46 loops=1)
   ->  Materialize  (cost=0.00..23.00 rows=1000 width=27) (actual
time=0.000..0.081 rows=1000 loops=46)
         ->  Seq Scan on items i  (cost=0.00..18.00 rows=1000 width=27)
(actual time=0.003..0.092 rows=1000 loops=1)
 Planning Time: 0.206 ms
 Execution Time: 69.633 ms


On Wed, Aug 25, 2021 at 4:05 PM Justin Pryzby <pry...@telsasoft.com> wrote:

> On Wed, Aug 25, 2021 at 11:47:43AM -0500, Jack Christensen wrote:
> > I have items that need to be categorized by user defined matching rules.
> > Trusted users can create rules that include regular expressions. I've
> > reduced the problem to this example.
>
> > I use the following query to find matches:
> >
> > select r.id, i.id
> > from items i
> >   join matching_rules r on i.name ~ r.name_matches;
> >
> > When there are few rules the query runs quickly. But as the number of
> rules
> > increases the runtime often increases at a greater than linear rate.
>
> Maybe it's because the REs are cached by RE_compile_and_cache(), but if you
> loop over the REs in the inner loop, then the caching is ineffecive.
>
> Maybe you can force it to join with REs on the outer loop by writing it as:
> | rules LEFT JOIN items WHERE rules.id IS NOT NULL,
> ..to improve performance, or at least test that theory.
>
> --
> Justin
>

Reply via email to