On Fri, Feb 19, 2021, at 16:26, Tom Lane wrote: >"Joel Jacobson" <j...@compiler.org> writes: >> On Thu, Feb 18, 2021, at 19:53, Tom Lane wrote: >>> (Having said that, I can't help noticing that a very large fraction >>> of those usages look like, eg, "[\w\W]". It seems to me that that's >>> a very expensive and unwieldy way to spell ".". Am I missing >>> something about what that does in Javascript?) > >> I think this is a non-POSIX hack to match any character, including newlines, >> which are not included unless the "s" flag is set. > >> "foo\nbar".match(/([\w\W]+)/)[1]; >> "foo >> bar" > >Oooh, that's very interesting. I guess the advantage of that over using >the 's' flag is that you can have different behaviors at different places >in the same regex.
I would guess the same thing. >I was just wondering about this last night in fact, while hacking on >the code to get it to accept \W etc in bracket expressions. I see that >right now, our code thinks that NLSTOP mode ('n' switch, the opposite >of 's') should cause \W \D \S to not match newline. That seems a little >weird, not least because \S should probably be different from the other >two, and it isn't. And now we see it'd mean that you couldn't use the 'n' >switch to duplicate Javascript's default behavior in this area. Should we >change it? (I wonder what Perl does.) > >regards, tom lane To allow comparing PostgreSQL vs Javascript vs Perl, I installed three helper-functions using plv8 and plperl, and also one convenience function for PostgreSQL to catch errors and return the error string instead: The string used in this test is "foo!\n!bar", which aims to detect differences in how new-lines and non alpha-number characters are handled. To allow PostgreSQL to be compared with Javascript and Perl, the "n" flag is used for PostgreSQL when no flags are used for Javascript/Perl, and no flag for PostgreSQL when the "s" flag is used for Javascript/Perl, for the results to be comparable. In Javascript, when a regex contains capture groups, the entire match is always returns as the first array element. To make it easier to visually compare the results, the first element is removed from Javascript, which works in this test since all regexes contain exactly one capture group. Here are the results: $ psql -e -f not_alnum.sql regex SELECT regexp_match_pg(E'foo!\n!bar', '(.+)', 'n'), (regexp_match_v8(E'foo!\n!bar', '(.+)', ''))[2:], regexp_match_pl(E'foo!\n!bar', '(.+)', '') ; regexp_match_pg | regexp_match_v8 | regexp_match_pl -----------------+-----------------+----------------- {foo!} | {foo!} | {foo!} (1 row) SELECT regexp_match_pg(E'foo!\n!bar', '(.+)', ''), (regexp_match_v8(E'foo!\n!bar', '(.+)', 's'))[2:], regexp_match_pl(E'foo!\n!bar', '(.+)', 's') ; regexp_match_pg | regexp_match_v8 | regexp_match_pl -----------------+-----------------+----------------- {"foo! +| {"foo! +| {"foo! + !bar"} | !bar"} | !bar"} (1 row) SELECT regexp_match_pg(E'foo!\n!bar', '([\w\W]+)', 'n'), (regexp_match_v8(E'foo!\n!bar', '([\w\W]+)', ''))[2:], regexp_match_pl(E'foo!\n!bar', '([\w\W]+)', '') ; regexp_match_pg | regexp_match_v8 | regexp_match_pl ------------------------------------------------------------+-----------------+----------------- {"invalid regular expression: invalid escape \\ sequence"} | {"foo! +| {"foo! + | !bar"} | !bar"} (1 row) SELECT regexp_match_pg(E'foo!\n!bar', '([\w\W]+)', ''), (regexp_match_v8(E'foo!\n!bar', '([\w\W]+)', 's'))[2:], regexp_match_pl(E'foo!\n!bar', '([\w\W]+)', 's') ; regexp_match_pg | regexp_match_v8 | regexp_match_pl ------------------------------------------------------------+-----------------+----------------- {"invalid regular expression: invalid escape \\ sequence"} | {"foo! +| {"foo! + | !bar"} | !bar"} (1 row) SELECT regexp_match_pg(E'foo!\n!bar', '([\w]+)', 'n'), (regexp_match_v8(E'foo!\n!bar', '([\w]+)', ''))[2:], regexp_match_pl(E'foo!\n!bar', '([\w]+)', '') ; regexp_match_pg | regexp_match_v8 | regexp_match_pl -----------------+-----------------+----------------- {foo} | {foo} | {foo} (1 row) SELECT regexp_match_pg(E'foo!\n!bar', '([\w]+)', ''), (regexp_match_v8(E'foo!\n!bar', '([\w]+)', 's'))[2:], regexp_match_pl(E'foo!\n!bar', '([\w]+)', 's') ; regexp_match_pg | regexp_match_v8 | regexp_match_pl -----------------+-----------------+----------------- {foo} | {foo} | {foo} (1 row) SELECT regexp_match_pg(E'foo!\n!bar', '([\W]+)', 'n'), (regexp_match_v8(E'foo!\n!bar', '([\W]+)', ''))[2:], regexp_match_pl(E'foo!\n!bar', '([\W]+)', '') ; regexp_match_pg | regexp_match_v8 | regexp_match_pl ------------------------------------------------------------+-----------------+----------------- {"invalid regular expression: invalid escape \\ sequence"} | {"! +| {"! + | !"} | !"} (1 row) SELECT regexp_match_pg(E'foo!\n!bar', '([\W]+)', ''), (regexp_match_v8(E'foo!\n!bar', '([\W]+)', 's'))[2:], regexp_match_pl(E'foo!\n!bar', '([\W]+)', 's') ; regexp_match_pg | regexp_match_v8 | regexp_match_pl ------------------------------------------------------------+-----------------+----------------- {"invalid regular expression: invalid escape \\ sequence"} | {"! +| {"! + | !"} | !"} (1 row) SELECT regexp_match_pg(E'foo!\n!bar', '(\w+)', 'n'), (regexp_match_v8(E'foo!\n!bar', '(\w+)', ''))[2:], regexp_match_pl(E'foo!\n!bar', '(\w+)', '') ; regexp_match_pg | regexp_match_v8 | regexp_match_pl -----------------+-----------------+----------------- {foo} | {foo} | {foo} (1 row) SELECT regexp_match_pg(E'foo!\n!bar', '(\w+)', ''), (regexp_match_v8(E'foo!\n!bar', '(\w+)', 's'))[2:], regexp_match_pl(E'foo!\n!bar', '(\w+)', 's') ; regexp_match_pg | regexp_match_v8 | regexp_match_pl -----------------+-----------------+----------------- {foo} | {foo} | {foo} (1 row) SELECT regexp_match_pg(E'foo!\n!bar', '(\W+)', 'n'), (regexp_match_v8(E'foo!\n!bar', '(\W+)', ''))[2:], regexp_match_pl(E'foo!\n!bar', '(\W+)', '') ; regexp_match_pg | regexp_match_v8 | regexp_match_pl -----------------+-----------------+----------------- {!} | {"! +| {"! + | !"} | !"} (1 row) SELECT regexp_match_pg(E'foo!\n!bar', '(\W+)', ''), (regexp_match_v8(E'foo!\n!bar', '(\W+)', 's'))[2:], regexp_match_pl(E'foo!\n!bar', '(\W+)', 's') ; regexp_match_pg | regexp_match_v8 | regexp_match_pl -----------------+-----------------+----------------- {"! +| {"! +| {"! + !"} | !"} | !"} (1 row) /Joel
not_alnum.sql
Description: Binary data