Hi, On Fri, Feb 26, 2021, at 01:16, Tom Lane wrote: > 0007-smarter-regex-allocation-2.patch
I've successfully tested this patch. I had to re-create the performance_test table since some cases the previously didn't give an error, now gives error "invalid regular expression: invalid character range". This is expected and of course an improvement, but just wanted to explain why the number of rows don't match the previous test runs. CREATE TABLE performance_test AS SELECT subjects.subject, patterns.pattern, patterns.flags, tests.is_match, tests.captured FROM tests JOIN subjects ON subjects.subject_id = tests.subject_id JOIN patterns ON patterns.pattern_id = subjects.pattern_id WHERE tests.error IS NULL -- -- the below part is added to ignore cases -- that now results in error: -- AND NOT EXISTS ( SELECT 1 FROM deviations WHERE deviations.test_id = tests.test_id AND deviations.error IS NOT NULL ); SELECT 3253889 Comparing 13.2 with HEAD, not a single test resulted in a different is_match value, i.e. the test just using the ~ regex operator, to only check if it matches or not. Good. SELECT COUNT(*) FROM deviations JOIN tests ON tests.test_id = deviations.test_id WHERE tests.is_match <> deviations.is_match count ------- 0 (1 row) The below query shows a frequency count per error message: SELECT error, COUNT(*) FROM deviations GROUP BY 1 ORDER BY 2 DESC error | count -----------------------------------------------------+-------- | 106173 regexp_match() does not support the "global" option | 5799 invalid regular expression: invalid character range | 1060 invalid regular expression option: "y" | 277 (4 rows) As we can see, 106173 cases now goes through without an error, that previously gave an error. This is thanks to now allowing escape sequences within bracket expressions. The other errors are expected and all good. End of correctness analysis. Now let's look at performance! I reran the same query three times to get a feeling for the stddev. \timing SELECT is_match <> (subject ~ pattern), captured IS DISTINCT FROM regexp_match(subject, pattern, flags), COUNT(*) FROM performance_test GROUP BY 1,2 ORDER BY 1,2; ?column? | ?column? | count ----------+----------+--------- f | f | 3253889 (1 row) HEAD (b3a9e9897ec702d56602b26a8cdc0950f23b29dc) Time: 125938.747 ms (02:05.939) Time: 125414.792 ms (02:05.415) Time: 126185.496 ms (02:06.185) HEAD (b3a9e9897ec702d56602b26a8cdc0950f23b29dc)+0007-smarter-regex-allocation-2.patch ?column? | ?column? | count ----------+----------+--------- f | f | 3253889 (1 row) Time: 89145.030 ms (01:29.145) Time: 89083.210 ms (01:29.083) Time: 89166.442 ms (01:29.166) That's a 29% speed-up compared to HEAD! Truly amazing. Let's have a look at the total speed-up compared to PostgreSQL 13. In my previous benchmarks testing against old versions, I used precompiled binaries, but this time I compiled REL_13_STABLE: Time: 483390.132 ms (08:03.390) That's a 82% speed-up in total! Amazing! /Joel