Btw: if you still run out of cache later with more regexes may be it makes sense to do prefiltering first my making a single gigantic regexp as string_agg(‘(‘||name_matches||’)’,’|’) and then only filter ones that match later. If postgresql provides capturing groups you may even be able to explode the result without postfilter.
ср, 25 серп. 2021 о 14:22 Jack Christensen <j...@jncsoftware.com> пише: > The optimizer was a bit too clever. It used the same plan for the LEFT > JOIN. But that put me on the right track. I tried a LATERAL join. But the > optimizer saw through that too and used the same plan. So I tried a > materialized CTE and that finally forced it to use a different plan. That > made it run in ~70ms -- about 18x faster. Thanks! > > explain analyze > with r as materialized ( > select * from matching_rules > where id >= 0 and id < 60 > ) > select r.id, i.id > from r > join items i on i.name ~ r.name_matches > ; > > QUERY PLAN > > ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── > Nested Loop (cost=2.78..714.20 rows=230 width=8) (actual > time=0.071..69.545 rows=702 loops=1) > Join Filter: (i.name ~ r.name_matches) > Rows Removed by Join Filter: 45298 > CTE r > -> Seq Scan on matching_rules (cost=0.00..2.78 rows=46 width=26) > (actual time=0.007..0.047 rows=46 loops=1) > Filter: ((id >= 0) AND (id < 60)) > Rows Removed by Filter: 6 > -> CTE Scan on r (cost=0.00..0.92 rows=46 width=36) (actual > time=0.008..0.090 rows=46 loops=1) > -> Materialize (cost=0.00..23.00 rows=1000 width=27) (actual > time=0.000..0.081 rows=1000 loops=46) > -> Seq Scan on items i (cost=0.00..18.00 rows=1000 width=27) > (actual time=0.003..0.092 rows=1000 loops=1) > Planning Time: 0.206 ms > Execution Time: 69.633 ms > > > On Wed, Aug 25, 2021 at 4:05 PM Justin Pryzby <pry...@telsasoft.com> > wrote: > >> On Wed, Aug 25, 2021 at 11:47:43AM -0500, Jack Christensen wrote: >> > I have items that need to be categorized by user defined matching rules. >> > Trusted users can create rules that include regular expressions. I've >> > reduced the problem to this example. >> >> > I use the following query to find matches: >> > >> > select r.id, i.id >> > from items i >> > join matching_rules r on i.name ~ r.name_matches; >> > >> > When there are few rules the query runs quickly. But as the number of >> rules >> > increases the runtime often increases at a greater than linear rate. >> >> Maybe it's because the REs are cached by RE_compile_and_cache(), but if >> you >> loop over the REs in the inner loop, then the caching is ineffecive. >> >> Maybe you can force it to join with REs on the outer loop by writing it >> as: >> | rules LEFT JOIN items WHERE rules.id IS NOT NULL, >> ..to improve performance, or at least test that theory. >> >> -- >> Justin >> >