On Fri, May 3, 2019 at 11:44 AM Christoph M. Becker <cmbecke...@gmx.de> wrote:
> On 03.05.2019 at 01:18, Björn Larsson wrote: > > > Den 2019-04-11 kl. 15:41, skrev Christoph M. Becker: > > > >> On 02.04.2019 at 11:42, Nicolai Scheer wrote: > >> > >>> I'm currently in the process of migrating an old application from php > >>> 5.6 > >>> to 7.2. > >>> In the process, I fiddled with the default_charset ini setting. > >>> > >>> The documentation states (c.f. > >>> https://www.php.net/manual/en/ini.core.php#ini.default-charset): > >>> > >>> "In PHP 5.6 onwards, "UTF-8" is the default value and [...] The value > of > >>> default_charset > >>> will also be used to set the default character set for [...] and for > >>> mbstring functions > >>> if the mbstring.http_input mbstring.http_output > >>> mbstring.internal_encoding > >>> configuration option is unset." > >>> > >>> As such, I'd expect to be able to set default_charset to iso-8859-1 and > >>> mbstring to pick that same setting for its internal encoding (if the > >>> mentioned directives are unset, that is). > >>> > >>> This seems not to be the case: > >>> > >>> <?php > >>> ini_set( 'default_charset', 'iso-8859-1' ); > >>> var_dump( ini_get("mbstring.internal_encoding") ); > >>> var_dump( ini_get("mbstring.http_input") ); > >>> var_dump( ini_get("mbstring.http_output") ); > >>> echo mb_internal_encoding() . "\n"; > >>> echo mb_strlen( "\xc3\xb6" ) . "\n"; > >>> echo mb_strlen( "\xc3\xb6", '8bit' ) . "\n"; > >>> > >>> This outputs (7.2.15 on a CentOS box): > >>> string(0) "" > >>> string(0) "" > >>> string(0) "" > >>> UTF-8 > >>> 1 > >>> 2 > >>> > >>> The default_charset is set but mbstring settings are not, so I'd > >>> expect to > >>> get 2 as the character/byte count in both cases. > >>> > >>> If I throw a mb_internal_encoding("iso-8859-1") in the mix, both string > >>> lengths are equal. > >>> > >>> Since the mentioned mbstring directives are deprecated as of 5.6.0 - > >>> do I > >>> really need to use mb_internal_encoding() instead? > >>> Is the documentation wrong or am I just misinterpreting it? I thought > >>> that > >>> default_charset should act as some kind of "master setting" in order > >>> not to > >>> have to set all specific settings as well (e.g. iconv, mbstring). > >>> > >>> Usually we use UTF-8, so I did not come across this before... > >>> > >>> Any insight? > >> > >> <https://3v4l.org/ZvQ67> confirms the reported behavior. A quick look > >> at the code, too. I suggest you file a ticket on > >> <https://bugs.php.net/>. > > > > Did this lead to a bug report? > > Hmm, apparently not. > This was reported as https://bugs.php.net/bug.php?id=77907 and will be fixed in 7.4. Nikita > > It lead to a bug in Smarty 3.1.33 for me. I got a warning about > > "mbregex compile err: invalid code point value" in mb_split(). > > I have content in ISO-8859-1 and Smarty normal procedure to > > set encoding and php.ini setting to ISO-8859-1 flunked. > > > > However mb_regex_encoding('ISO-8859-1') did the trick! > > While the RFC[1] states > > | all functions that take encoding option use php.internal_encoding as > | default (e.g. htmlentities/mb_strlen/mb_regex/etc) > > apparently this has not been implemented (yet). > > [1] <https://wiki.php.net/rfc/default_encoding> > > -- > Christoph M. Becker > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > >