"Joel Jacobson" <j...@compiler.org> writes: >> I've produced a new dataset which now also includes the regex flags (if >> any) used for each subject applied to a pattern.
Again, thanks for collecting this data! I'm a little confused about how you produced the results in the "tests" table, though. It sort of looks like you tried to feed the Javascript flags to regexp_match(), which unsurprisingly doesn't work all that well. Even discounting that, I'm not getting quite the same results, and I don't understand why not. So how was that made from the raw "patterns" and "subjects" tables? > PostgreSQL 13.2 on x86_64-apple-darwin19.6.0, compiled by Apple clang version > 11.0.3 (clang-1103.0.32.62), 64-bit > Time: 758514.703 ms (12:38.515) > Time: 755883.600 ms (12:35.884) > Time: 746522.107 ms (12:26.522) > > PostgreSQL 14devel on x86_64-apple-darwin20.3.0, compiled by Apple clang > version 12.0.0 (clang-1200.0.32.29), 64-bit > HEAD (4e703d671) > Time: 519620.646 ms (08:39.621) > Time: 518998.366 ms (08:38.998) > Time: 519696.129 ms (08:39.696) Hmmm ... we haven't yet committed any performance-relevant changes to the regex code, so it can't take any credit for this improvement from 13.2 to HEAD. I speculate that this is due to some change in our parallelism stuff (since I observe that this query is producing a parallelized hash plan). Still, the next drop to circa 2:21 runtime is impressive enough by itself. > Heh, what a funny coincidence: > The regex I used to shrink the very-long-pattern, > actually happens to run a lot faster with the patches. Yeah, that just happens to be a poster child for the MATCHALL idea: > EXPLAIN ANALYZE SELECT > regexp_matches(repeat('a',100000),'^(.{1,80})(.*?)(.{1,80})$'); Each of the parenthesized subexpressions of the RE is successfully recognized as being MATCHALL, with length range 1..80 for two of them and 0..infinity for the middle one. That means the engine doesn't have to physically scan the text to determine whether a possible division point satisfies the sub-regexp; and that means we can find the correct division points in O(N) not O(N^2) time. regards, tom lane