På torsdag 16. august 2018 kl. 15:16:52, skrev Adrian Klaver < adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>>: On 08/16/2018 03:59 AM, Alex Kliukin wrote: > Hi, > > Here is a simple SQL statement that gives different results on PostgreSQL 9.6 and PostgreSQL 10+. The space character at the end of the string is actually U+2006 SIX-PER-EM SPACE (http://www.fileformat.info/info/unicode/char/2006/index.htm) > > test=# select 'abcd ' ~ 'abcd\s'; > ?column? > ---------- > t > (1 row) > > test=# select version(); > version > ------------------------------------------------------------------------------------------------- > PostgreSQL 12devel on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0, 64-bit > (1 row) > > > On another server (running on the same system on a different port) > > postgres=# select version(); > version > ----------------------------------------------------------------------------------------------- > PostgreSQL 9.6.9 on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0, 64-bit > (1 row) > > postgres=# select 'abcd ' ~ 'abcd\s'; > ?column? > ---------- > f > (1 row) > > For both clusters, the client encoding is UTF8, the database encoding and collation is UTF8 and en_US.utf8 respectively, and the lc_ctype is en_US.utf8. I am accessing the databases running locally by ssh-ing first to the host. > > I observed similar issues with other Linux-based servers running Ubuntu, in all cases the regex resulted in true on PostgreSQL 10+ and false on earlier versions (down to 9.3). The query comes from a table check that suddenly stopped accepting rows valid in the older version during the migration. Making it select 'abcd ' ~ E'abcd\\s' doesn't modify the outcome, unsurprisingly. > > Is it reproducible for others here as well? Given that it is, Is there a way to make both versions behave the same?
select version(); version ------------------------------------------------------------------------------------ PostgreSQL 10.5 on x86_64-pc-linux-gnu, compiled by gcc (SUSE Linux) 4.8.5, 64-bit lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 test=# select 'abcd'||chr(2006) ~ E'abcd\s'; ?column? ---------- f (1 row) In your example you are working on Postgres devel. Have you tried it on Postgres 10 and/or 11? char(2006) produces the wrong character as 2006 is the hex-value. You have to use 8198: andreak@[local]:5433 10.4 andreak=# select version(); ┌────────────────────────────────────────────────────────────────────────────────────────────────┐ │ version │ ├────────────────────────────────────────────────────────────────────────────────────────────────┤ │ PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0, 64-bit │ └────────────────────────────────────────────────────────────────────────────────────────────────┘ (1 row) andreak@[local]:5433 10.4 andreak=# select 'abcd'||chr(8198) ~ 'abcd\s'; ┌──────────┐ │ ?column? │ ├──────────┤ │ t │ └──────────┘ (1 row) -- Andreas Joseph Krogh CTO / Partner - Visena AS Mobile: +47 909 56 963 andr...@visena.com <mailto:andr...@visena.com> www.visena.com <https://www.visena.com> <https://www.visena.com>