[PHP-DEV] Re: [PHP-I18N] RFC: Error handling in HTTP input decoding

Andrei Zmievski Fri, 07 Jul 2006 11:09:48 -0700

Rasmus and I talked about this some more yesterday, and I think thereis an alternate, better approach.

PHP will attempt to decode the incoming request data as describedbelow. The variables that it decodes successfully will be put into therequest arrays as Unicode strings, those that fail -- as binarystrings. We will still set a flag indicating that there were problemsduring the conversion, but this way the user has access to the rawinput in case of failure. Since we will be pushing the usage of theinput filter extension, we should use it to access the requestparameters (instead of the proposed request_get_raw() function below).The input filter extension always looks in the raw input data and notin the request arrays, and input_get_arg() has a 'charset' parameterthat can be specified to tell PHP what charset the incoming data is in.I think this way we kill both birds with one stone: we give peopleaccess to request arrays data on successful decoding and we also givethem a standard and secure way to get at the data in case on faileddecoding.


Please comment.

-Andrei

On Jun 22, 2006, at 2:46 PM, Andrei Zmievski wrote:

I'd like to solicit opinions on how we should treat conversionfailures during HTTP input decoding. There are two issues at hand:fallback mechanism and application-driven decoding in case of failure.Let's look at the proposal for the latter one first.
If the decoding of HTTP input fails (and the failure state would beachieved as soon as even one variable fails), PHP should set an errorflag somewhere that is accessible to the user, via either a globalvariable or a function. It should also keep the original request dataaround (query string, POST body, and cookie data). The applicationshould be able to access this data, since the encoding can be passedin the query string [1]. The application can then check this errorflag and then call a function -- request_decode() perhaps -- to askPHP to re-decode the request data based on a this specific encoding.For example:
  if (request_decoding_failed()) {
     request_decode(request_get_raw('ei'));
  }
We might be able to tie this in with the input filter, but that meansthat the input filter will have to be required by PHP. I am open toother suggestions in this area.
As for the first issue, PHP attempts to decode the input using thevalue of the unicode.output_encoding setting, because that is the mostlogical choice if we assume that the clients send the data back in theencoding that the page with the form was in. We could implement afallback mechanism where PHP looks at the Accept-Charset header sentby the client[2]. This header is supposed to indicate what charactersets are acceptable for the response. While this is not the same asspecifying the character set of the request, it might be a good enoughindicator of it. Or we could simply set the error state and letapplication figure out what charset it wants to use for decoding.
Thanks for your attention.

-Andrei

[1] http://search.yahoo.com/search?ei=UTF-8&p=php
[2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] Re: [PHP-I18N] RFC: Error handling in HTTP input decoding

Reply via email to