Hi,

I'm +1 for having internal/input/output/script encoding setting at PHP
or Zend level.

If the default is the problem is the problem, we should set default_charset
default to UTF-8 and use them as default for internal/input/output/script
and functions that affected by encoding.

When XSS advisory was released at Feb. 2000, it stated encoding
MUST be specified in HTTP response header. Setting default_charset
is the best practice for security perspective anyway.

If we use default_charset as default encoding, transition to 5.4 might
be easier.

Regards,

--
Yasuo Ohgaki
yohg...@ohgaki.net


2012/8/24 Rasmus Lerdorf <ras...@lerdorf.com>:
> htmlspecialchars(), htmlentities(), html_entity_decode() and
> get_html_translation_table() all take an encoding parameter that used to
> default to iso-8859-1. We changed the default in PHP 5.4 to UTF-8. This
> is a much more sensible default and in the case of the encoding
> functions more secure as it prevents invalid UTF-8 from getting through.
> If you use 8859-1 as the default but your app is actually in UTF-8 or
> worse, some encoding that isn't low-ascii compatible then
> htmlspecialchars()/htmlentities() aren't doing what you think they are
> and you have a glaring security hole in your app.
>
> However, people are understandably lazy and don't want to think about
> this stuff. They don't want to explicitly provide their input encoding
> to these calls. We provided a solution to this and a way to write
> portable apps and that was to pass in an empty string "" as the
> encoding. If we saw this we would set the input encoding to match the
> output encoding specified by the "default_charset" ini setting. We
> couldn't just default to this default_charset because input and output
> encodings may very well be different and we would risk making existing
> apps insecure. For example an app using BIG5/CJK for its output encoding
> might very well be pulling data from 8859/UTF-8 data sources and if we
> invisibly switched htmlspecialchars/entities to match their output
> encoding we would have problems. Invisibly switching them from 8859-1 to
> UTF-8 could still be problematic, but it at least it fails safe in that
> it doesn't let invalid UTF-8 through and encodes low-ascii the same way
> it did before.
>
> The problem is that there is a lot of legacy code out there that doesn't
> explicitly set the encoding on those calls and it is a lot of work to go
> through and specify it on each call. I still personally prefer to have
> people be explicit here, but I think it is slowing 5.4 adoption (see bug
> 61354).
>
> In PHP 6 we tried to introduce separate input, script and output
> encoding settings. Currently in 5.4 we don't have that, but we have
> those 3 separately for mbstring and for iconv:
>
> iconv.input_encoding
> iconv.internal_encoding
> iconv.output_encoding
> mbstring.http_input
> mbstring.internal_encoding
> mbstring.http_output
>
> Ideally we should be getting rid of the per-feature encoding settings
> and have a single set of them that we refer to when we need them. This
> is one of these places where we really need a default input encoding
> setting. We could have it check mbstring.http_input, but there is a
> wrinkle here that it has a fancy "auto" setting which we don't really
> want in this case. So we could set it to iconv.input_encoding, but that
> seems rather random and unintuitive.
>
> So do we create a new default_input_encoding ini directive mid-stream in
> 5.4 for this? Of course with the longer-term in mind that this will be
> part of a unified set of encoding settings in 5.5 and beyond.
>
> -Rasmus
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to