On Thu, Sep 24, 2020 at 11:30:35AM +0200, Boudewijn Dijkstra wrote: > Op Thu, 24 Sep 2020 02:56:51 +0200 schreef Andrew Hewus Fresh > <and...@afresh1.com>: > > On Wed, Sep 23, 2020 at 09:11:44AM +0200, Boudewijn Dijkstra wrote: > > > Op Thu, 10 Sep 2020 04:01:30 +0200 schreef Bambero <bamb...@gmail.com>: > > > > Hi, > > > > > > > > It seems that perl regular expressions lost one polish letter (ą): > > > > https://www.compart.com/en/unicode/U+0105 > > > > > > > > I can see this problem only under OpenBSD 6.7 with php-7.4 (same > > > > version of php under linux is OK) > > > > > > > > Ex.: > > > > > > > > PHP 7.4.10 or 7.4.5 > > > > <?php var_dump(preg_match('/^.{5,64}$/', 'daswęzdas')); > > > > int(1) // OK > > > > > > > > PHP 7.4.10 or 7.4.5 > > > > <?php var_dump(preg_match('/^.{5,64}$/', 'daswązdas')); > > > > int(0) // UPS??? > > > > > > > > PHP 7.3.21 > > > > <?php var_dump(preg_match('/^.{5,64}$/', 'daswęzdas')); > > > > int(1) // OK > > > > > > > > PHP 7.3.21 > > > > <?php var_dump(preg_match('/^.{5,64}$/', 'daswązdas')); > > > > int(1) // OK > > > > > > > > Any ideas how to fix that? > > > > > > > > Regards, > > > > Bambero > > > > > > The same happens with any UTF-8 sequence that ends in 0x85. I guess > > > (a part of) PHP's PCRE code is not in UTF-8 mode, causing triggers > > > onCHAR_NEL (=0x85). > > > > I don't know a lot about PHP or the external PCRE library, but my guess > > would be that php is treating the string as bytes not characters. Can > > you try using the "u" (PCRE_UTF8) modifier? > > > > https://www.php.net/manual/en/reference.pcre.pattern.modifiers.php > > Indeed with "u" the expected 1 is returned! Now the question is, why is this > needed on OpenBSD but not in Linux or Windows?
There are many unicode related changes in php 7.4, so I'm sure they fixed something. https://www.php.net/ChangeLog-7.php I would guess that linux and windows both default to a UTF-8 locale, while OpenBSD defaults to the C locale. Does the out put from locale(1) provide you any hints? Do you get any different results testing it with `LC_ALL=en_US.UTF-8`? I don't know enough about php to know how it determines what locale to use, so that may not have any effect, or you may need to adjust something else. l8rZ, -- andrew - http://afresh1.com Adding manpower to a late software project makes it later.