On Fri, May 3, 2019 at 11:44 AM Christoph M. Becker <cmbecke...@gmx.de>
wrote:

> On 03.05.2019 at 01:18, Björn Larsson wrote:
>
> > Den 2019-04-11 kl. 15:41, skrev Christoph M. Becker:
> >
> >> On 02.04.2019 at 11:42, Nicolai Scheer wrote:
> >>
> >>> I'm currently in the process of migrating an old application from php
> >>> 5.6
> >>> to 7.2.
> >>> In the process, I fiddled with the default_charset ini setting.
> >>>
> >>> The documentation states (c.f.
> >>> https://www.php.net/manual/en/ini.core.php#ini.default-charset):
> >>>
> >>> "In PHP 5.6 onwards, "UTF-8" is the default value and [...] The value
> of
> >>> default_charset
> >>> will also be used to set the default character set for [...] and for
> >>> mbstring functions
> >>> if the mbstring.http_input mbstring.http_output
> >>> mbstring.internal_encoding
> >>> configuration option is unset."
> >>>
> >>> As such, I'd expect to be able to set default_charset to iso-8859-1 and
> >>> mbstring to pick that same setting for its internal encoding (if the
> >>> mentioned directives are unset, that is).
> >>>
> >>> This seems not to be the case:
> >>>
> >>> <?php
> >>> ini_set( 'default_charset', 'iso-8859-1' );
> >>> var_dump( ini_get("mbstring.internal_encoding") );
> >>> var_dump( ini_get("mbstring.http_input") );
> >>> var_dump( ini_get("mbstring.http_output") );
> >>> echo mb_internal_encoding() . "\n";
> >>> echo mb_strlen( "\xc3\xb6" ) . "\n";
> >>> echo mb_strlen( "\xc3\xb6", '8bit' ) . "\n";
> >>>
> >>> This outputs (7.2.15 on a CentOS box):
> >>> string(0) ""
> >>> string(0) ""
> >>> string(0) ""
> >>> UTF-8
> >>> 1
> >>> 2
> >>>
> >>> The default_charset is set but mbstring settings are not, so I'd
> >>> expect to
> >>> get 2 as the character/byte count in both cases.
> >>>
> >>> If I throw a mb_internal_encoding("iso-8859-1") in the mix, both string
> >>> lengths are equal.
> >>>
> >>> Since the mentioned mbstring directives are deprecated as of 5.6.0 -
> >>> do I
> >>> really need to use mb_internal_encoding() instead?
> >>> Is the documentation wrong or am I just misinterpreting it? I thought
> >>> that
> >>> default_charset should act as some kind of "master setting" in order
> >>> not to
> >>> have to set all specific settings as well (e.g. iconv, mbstring).
> >>>
> >>> Usually we use UTF-8, so I did not come across this before...
> >>>
> >>> Any insight?
> >>
> >> <https://3v4l.org/ZvQ67> confirms the reported behavior.  A quick look
> >> at the code, too.  I suggest you file a ticket on
> >> <https://bugs.php.net/>.
> >
> > Did this lead to a bug report?
>
> Hmm, apparently not.
>

This was reported as https://bugs.php.net/bug.php?id=77907 and will be
fixed in 7.4.

Nikita


> > It lead to a bug in Smarty 3.1.33 for me. I got a warning about
> > "mbregex compile err: invalid code point value" in mb_split().
> > I have content in ISO-8859-1 and Smarty normal procedure to
> > set encoding and php.ini setting to ISO-8859-1 flunked.
> >
> > However mb_regex_encoding('ISO-8859-1') did the trick!
>
> While the RFC[1] states
>
> | all functions that take encoding option use php.internal_encoding as
> | default (e.g. htmlentities/mb_strlen/mb_regex/etc)
>
> apparently this has not been implemented (yet).
>
> [1] <https://wiki.php.net/rfc/default_encoding>
>
> --
> Christoph M. Becker
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

Reply via email to