Sv: Re: regex match and special characters

Andreas Joseph Krogh Thu, 16 Aug 2018 07:04:41 -0700

På torsdag 16. august 2018 kl. 15:16:52, skrev Adrian Klaver <
adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>>:
On 08/16/2018 03:59 AM, Alex Kliukin wrote:
 > Hi,
 >
 > Here is a simple SQL statement that gives different results on PostgreSQL 
9.6 and PostgreSQL 10+. The space character at the end of the string is 
actually U+2006 SIX-PER-EM SPACE 
(http://www.fileformat.info/info/unicode/char/2006/index.htm)
 >
 > test=# select 'abcd ' ~ 'abcd\s';
 >   ?column?
 > ----------
 >   t
 > (1 row)
 >
 > test=# select version();
 >                                               version
 > 
-------------------------------------------------------------------------------------------------
 >   PostgreSQL 12devel on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 
6.4.0-r1 p1.3) 6.4.0, 64-bit
 > (1 row)
 >
 >
 > On another server (running on the same system on a different port)
 >
 > postgres=# select version();
 >                                              version
 > 
-----------------------------------------------------------------------------------------------
 >   PostgreSQL 9.6.9 on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 6.4.0-r1 
p1.3) 6.4.0, 64-bit
 > (1 row)
 >
 > postgres=# select 'abcd ' ~ 'abcd\s';
 >   ?column?
 > ----------
 >   f
 > (1 row)
 >
 > For both clusters, the client encoding is UTF8, the database encoding and 
collation is UTF8 and en_US.utf8 respectively, and the lc_ctype is en_US.utf8. 
I am accessing the databases running locally by ssh-ing first to the host.
 >
 > I observed similar issues with other Linux-based servers running Ubuntu, in 
all cases the regex resulted in true on PostgreSQL 10+ and false on earlier 
versions (down to 9.3). The query comes from a table check that suddenly 
stopped accepting rows valid in the older version during the migration. Making 
it  select 'abcd ' ~ E'abcd\\s' doesn't  modify the outcome, unsurprisingly.
 >
 > Is it reproducible for others here as well? Given that it is, Is there a 
way to make both versions behave the same?


 select version();
                                        version

 
------------------------------------------------------------------------------------
   PostgreSQL 10.5 on x86_64-pc-linux-gnu, compiled by gcc (SUSE Linux)
 4.8.5, 64-bit


 lc_collate                          | en_US.UTF-8

 lc_ctype                            | en_US.UTF-8


 test=# select 'abcd'||chr(2006) ~ E'abcd\s';
   ?column?
 ----------
   f
 (1 row)

 In your example you are working on Postgres devel. Have you tried it on
 Postgres 10 and/or 11?
 
char(2006) produces the wrong character as 2006 is the hex-value. You have to 
use 8198:
 
andreak@[local]:5433 10.4 andreak=# select version(); 
 
┌────────────────────────────────────────────────────────────────────────────────────────────────┐
 │                                            version 
                                            │
 
├────────────────────────────────────────────────────────────────────────────────────────────────┤
 │ PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 
7.3.0-16ubuntu3) 7.3.0, 64-bit │
 
└────────────────────────────────────────────────────────────────────────────────────────────────┘
 (1 row)

andreak@[local]:5433 10.4 andreak=# select 'abcd'||chr(8198) ~ 'abcd\s'; 
 ┌──────────┐
 │ ?column? │
 ├──────────┤
 │ t        │
 └──────────┘
 (1 row)
  
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>

Sv: Re: regex match and special characters

Reply via email to