Op Thu, 10 Sep 2020 04:01:30 +0200 schreef Bambero <bamb...@gmail.com>:
Hi,

It seems that perl regular expressions lost one polish letter (ą):
https://www.compart.com/en/unicode/U+0105

I can see this problem only under OpenBSD 6.7 with php-7.4 (same version of php under linux is OK)

Ex.:

PHP 7.4.10 or 7.4.5
<?php var_dump(preg_match('/^.{5,64}$/', 'daswęzdas'));
int(1) // OK

PHP 7.4.10 or 7.4.5
<?php var_dump(preg_match('/^.{5,64}$/', 'daswązdas'));
int(0) // UPS???

PHP 7.3.21
<?php var_dump(preg_match('/^.{5,64}$/', 'daswęzdas'));
int(1) // OK

PHP 7.3.21
<?php var_dump(preg_match('/^.{5,64}$/', 'daswązdas'));
int(1) // OK

Any ideas how to fix that?

Regards,
Bambero

The same happens with any UTF-8 sequence that ends in 0x85. I guess (a part of) PHP's PCRE code is not in UTF-8 mode, causing triggers on CHAR_NEL (=0x85).

for ($i = 0x75; $i <= 0x825; $i++) {
        $u = mb_chr($i);
        $str = 'dasw' . $u . 'zdas';
        $r = preg_match('/^.{5,64}$/', $str);
        if ($r == 0) {
                printf("%04x:", $i);
                for ($j = 0; $j < strlen($u); $j++) {
                        $b = ord(substr($str, 4 + $j));
                        printf(" %02x", $b);
                }
                printf(": %s\n", $str);
        }
}


--
Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/

Reply via email to