Re: [PHP-DEV] default charset confusion

2012-04-01 Thread Daniel Convissor
Hi Folks: This topic appears to have been quietly tabled. I didn't notice a decision here or a commit. On Mon, Mar 12, 2012 at 01:12:03PM -0700, Rasmus Lerdorf wrote: > > So maybe a way to tackle this is to use the > mbstring internal encoding when it is set as the htmlspecialchars > default wh

Re: [PHP-DEV] default charset confusion

2012-03-14 Thread Ángel González
On 13/03/12 00:25, Stas Malyshev wrote: > Hi! > >> Still, that API is likely wrong: a library function written by someone >> completely unrelated to the main application shouldn't be echoing >> anything through the output. And if it's not generating the html, the >> htmlspecialchars is better done

Re: [PHP-DEV] default charset confusion

2012-03-14 Thread jpauli
On Wed, Mar 14, 2012 at 3:37 PM, Gustavo Lopes wrote: > On Wed, 14 Mar 2012 14:55:17 +0100, jpauli wrote: > > I would then propose to make mbstring compile time mandatory. >> >> > I'm completely against these kind of lazy solutions. Yes, let's add strong > coupling (already starting to smell) to

Re: [PHP-DEV] default charset confusion

2012-03-14 Thread Gustavo Lopes
On Wed, 14 Mar 2012 14:55:17 +0100, jpauli wrote: I would then propose to make mbstring compile time mandatory. I'm completely against these kind of lazy solutions. Yes, let's add strong coupling (already starting to smell) to one of the largest extensions and make it compile time mandat

Re: [PHP-DEV] default charset confusion

2012-03-14 Thread Ferenc Kovacs
On Wed, Mar 14, 2012 at 3:29 PM, Michael Stowe wrote: > Correct me if I'm wrong, but I believe Zend Multibyte is now enabled by > default in PHP 5.4. > > - Mike > > http://lxr.php.net/opengrok/xref/PHP_5_4/UPGRADING#91 http://lxr.php.net/opengrok/xref/PHP_5_4/Zend/zend.c#108 http://lxr.php.net/op

Re: [PHP-DEV] default charset confusion

2012-03-14 Thread Michael Stowe
Correct me if I'm wrong, but I believe Zend Multibyte is now enabled by default in PHP 5.4. - Mike On Wed, Mar 14, 2012 at 9:24 AM, Ferenc Kovacs wrote: > > > > > > I would then propose to make mbstring compile time mandatory. > > > > I'm against yet another global ini setting, I find the ac

Re: [PHP-DEV] default charset confusion

2012-03-14 Thread Ferenc Kovacs
> > > I would then propose to make mbstring compile time mandatory. > > I'm against yet another global ini setting, I find the actual ini settings > confusing enough to add one more that would moreover reflect mbstring one's > (and add more and more confusion). > Why not turn ext/mbstring mandatory

Re: [PHP-DEV] default charset confusion

2012-03-14 Thread jpauli
On Tue, Mar 13, 2012 at 1:52 AM, Yasuo Ohgaki wrote: > 2012/3/13 Rasmus Lerdorf : > > On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote: > >> I thought default_charset became UTF-8, so I was expecting > >> following HTTP header. > >> > >> content-type text/html; charset=UTF-8 > >> > >> However, I got e

Re: [PHP-DEV] default charset confusion

2012-03-13 Thread Tomas Kuliavas
2012.03.13 16:38 Richard Lynch rašė: > I'd have to agree with Stas that everybody should start passing in a > variable there, that can be set somewhere in a config, or, perhaps, > would DEFAULT to, e... You do realize that suggestions on this thread and original bug reporter failed to make cor

Re: [PHP-DEV] default charset confusion

2012-03-13 Thread Richard Lynch
On Mon, March 12, 2012 2:44 pm, Rasmus Lerdorf wrote: > But you can't necessarily hardcode the encoding if you are writing > portable code. That's a bit like hardcoding a timezone. In order to > write portable code you need to give people the ability to localize > it. If you wanted it portable, wo

Re: [PHP-DEV] default charset confusion

2012-03-13 Thread Christian Schneider
Am 13.03.2012, 02:34 Uhr, schrieb Rasmus Lerdorf : On 03/12/2012 05:52 PM, Yasuo Ohgaki wrote: I always set all parameters for htmlentities/htmlspecialchars, therefore I haven't noticed this was changed from 5.3. They may be migrating from 5.2 or older. (RHEL5 uses 5.1) No, like I showed, movi

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 05:52 PM, Yasuo Ohgaki wrote: > I always set all parameters for htmlentities/htmlspecialchars, therefore > I haven't noticed this was changed from 5.3. They may be migrating from > 5.2 or older. (RHEL5 uses 5.1) No, like I showed, moving from 5.3 to 5.4 breaks because the new default

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Yasuo Ohgaki
2012/3/13 Rasmus Lerdorf : > On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote: >> I thought default_charset became UTF-8, so I was expecting >> following HTTP header. >> >> content-type  text/html; charset=UTF-8 >> >> However, I got empty charset (missing 'charset=UTF-8'). >> So I looked up to source and

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev
Hi! Still, that API is likely wrong: a library function written by someone completely unrelated to the main application shouldn't be echoing anything through the output. And if it's not generating the html, the htmlspecialchars is better done from the return at the calling application (probably

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Ángel González
On 12/03/12 22:30, Stas Malyshev wrote: > Hi! > >> If you are a framework developer, and really want to shield against a >> bad php.ini setting, you could ini_set() to your prefered charset at the >> beginning of the request. > > That assuming "the request" is completely processed by your framework

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev
Hi! If you are a framework developer, and really want to shield against a bad php.ini setting, you could ini_set() to your prefered charset at the beginning of the request. That assuming "the request" is completely processed by your framework and you never call any outside code and any outsi

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Ángel González
On 12/03/12 20:51, Stas Malyshev wrote: > Hi! > >> But you can't necessarily hardcode the encoding if you are writing >> portable code. That's a bit like hardcoding a timezone. In order to >> write portable code you need to give people the ability to localize it. > > No, it's not like timezone at a

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Pierre Joye
hi Rasmus, On Mon, Mar 12, 2012 at 9:12 PM, Rasmus Lerdorf wrote: > If everything was UTF-8 we wouldn't have any of these issues. > Unfortunately that isn't the case. The question is what to do with apps > that need to deal with non UTF-8 data. Are we going to provide any help > to them beyond j

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:51 PM, Stas Malyshev wrote: > Hi! > >> But you can't necessarily hardcode the encoding if you are writing >> portable code. That's a bit like hardcoding a timezone. In order to >> write portable code you need to give people the ability to localize it. > > No, it's not like timezo

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev
Hi! But you can't necessarily hardcode the encoding if you are writing portable code. That's a bit like hardcoding a timezone. In order to write portable code you need to give people the ability to localize it. No, it's not like timezone at all. I have to support all timezones in a global app

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:40 PM, Stas Malyshev wrote: > Hi! > >> And yes, it may very well be dangerous to use the wrong charset and now >> that we have better support for GB2312 and other asian charsets in the >> entities functions in 5.4 it is even more prudent to choose the right >> one so we should pro

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev
Hi! And yes, it may very well be dangerous to use the wrong charset and now that we have better support for GB2312 and other asian charsets in the entities functions in 5.4 it is even more prudent to choose the right one so we should provide some way to help people get it right short of changing

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Richard Lynch
On Mon, March 12, 2012 1:49 am, Rasmus Lerdorf wrote: > What we really need is what we added in PHP 6. A runtime encoding ini > setting that is distinct from the output charset which we can use > here. The usual argument against another php.ini setting, other than "too many already" is the difficu

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Michael Stowe
I think the ini directive, while adding another to the list, may be the most unobtrusive method to address this issue, at least for developers. I definitely agree with Rasmus that this could be one of the bigger headaches in transitioning to 5.4 (for non-UTF8 sites) and unless we can come up with

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote: > Hi > > I think following PHP 5.4.0 NEWS entry is misleading. > > . Changed default value of "default_charset" php.ini option from ISO-8859-1 > to > UTF-8. (Rasmus) Yes, I have fixed that now. > I thought default_charset became UTF-8, so I was

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Laruence
On Mon, Mar 12, 2012 at 6:21 PM, Yasuo Ohgaki wrote: > Hi, > > I think motivation of > >       /* Default is now UTF-8 */ >       if (charset_hint == NULL) >               return cs_utf_8; > > is for better performance and I think it's good for better performance. > Alternative of my suggestion is

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Yasuo Ohgaki
Hi, I think motivation of /* Default is now UTF-8 */ if (charset_hint == NULL) return cs_utf_8; is for better performance and I think it's good for better performance. Alternative of my suggestion is introduce new php.ini entry as Rusmus mentioned. The name may be "

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Yasuo Ohgaki
Hi I think following PHP 5.4.0 NEWS entry is misleading. . Changed default value of "default_charset" php.ini option from ISO-8859-1 to UTF-8. (Rasmus) I thought default_charset became UTF-8, so I was expecting following HTTP header. content-typetext/html; charset=UTF-8 However, I go

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:52 AM, Stas Malyshev wrote: > Hi! > >> Ignoring 5.4 for a second, if you in 5.3 do this: >> >> echo htmlspecialchars($string); >> echo htmlspecialchars($string, NULL, "ISO-8859-1"); >> echo htmlspecialchars($string, NULL, "UTF-8"); >> >> You will see that the first two output the

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Adam Jon Richardson
On Mon, Mar 12, 2012 at 3:52 AM, Stas Malyshev wrote: > Hi! > > > Ignoring 5.4 for a second, if you in 5.3 do this: >> >> echo htmlspecialchars($string); >> echo htmlspecialchars($string, NULL, "ISO-8859-1"); >> echo htmlspecialchars($string, NULL, "UTF-8"); >> >> You will see that the first two

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev
Hi! Ignoring 5.4 for a second, if you in 5.3 do this: echo htmlspecialchars($string); echo htmlspecialchars($string, NULL, "ISO-8859-1"); echo htmlspecialchars($string, NULL, "UTF-8"); You will see that the first two output the escaped string with the GB2312 bytes intact within it and the UTF-

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:41 AM, Rasmus Lerdorf wrote: > $string = $string = "$gb2312"; Sorry typo there obviously. Just one $string -Rasmus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:10 AM, Stas Malyshev wrote: > Hi! > >> What we really need is what we added in PHP 6. A runtime encoding ini >> setting that is distinct from the output charset which we can use here. >> That would allow people to fix all their legacy code to a specific >> runtime encoding with a

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Laruence
On Mon, Mar 12, 2012 at 3:10 PM, Stas Malyshev wrote: > Hi! > > >> What we really need is what we added in PHP 6. A runtime encoding ini >> setting that is distinct from the output charset which we can use here. >> That would allow people to fix all their legacy code to a specific >> runtime encod

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Laruence
On Mon, Mar 12, 2012 at 3:10 PM, Stas Malyshev wrote: > Hi! > > >> What we really need is what we added in PHP 6. A runtime encoding ini >> setting that is distinct from the output charset which we can use here. >> That would allow people to fix all their legacy code to a specific >> runtime encod

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev
Hi! What we really need is what we added in PHP 6. A runtime encoding ini setting that is distinct from the output charset which we can use here. That would allow people to fix all their legacy code to a specific runtime encoding with a single ini setting instead of changing thousands of lines o

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Adam Jon Richardson
On Mon, Mar 12, 2012 at 2:49 AM, Rasmus Lerdorf wrote: > What we really need is what we added in PHP 6. A runtime encoding ini > setting that is distinct from the output charset which we can use here. > That would allow people to fix all their legacy code to a specific > runtime encoding with a s

Re: [PHP-DEV] default charset confusion

2012-03-11 Thread Laruence
On Mon, Mar 12, 2012 at 2:49 PM, Rasmus Lerdorf wrote: > I caused this situation myself by not explicitly differentiating between > the default charset for the internal htmlspecialchars() and > htmlentities() functions and the output charset directive ini directive > default_charset. > > The idea