I think #2 is better than #1.
The current implementation of mbstring is based on the solution similar
to #1. It is simple and stable, but, #2 has more flexibility.

Rui

On Thu, 14 Dec 2006 21:59:44 +0100
Pierre <[EMAIL PROTECTED]> wrote:

> Hello,
> 
> Yesterday, Ilia, Andrei and I discussed the possible solutions to solve
> the input encoding in php6 (unicode). I will try to describe them here.
> 
> I do not go too deep in the details,  the goal is to choose one
> solution and then propose a patch to test. Our preference goes to
> the solution #2.
> 
> --
> Solution #1:
> ------------
> The idea here is to detect encoding, encode and register the variable
> during the request initialization (before the script gets the hand).
> Besides the encoding detection, it is how it works in the actual
> implementation (all php versions).
> 
> * Init
>  - Parse the request into an array.
>  - locate _charset_ or use unicode.request_encoding
> -  filter/decode/register the variable like it is done now
> 
> * Runtime
> Just like now, the auto_globals (with or without jit) are declared and
> ready to be used.
> 
> This solution has one advantage, it requires only a few changes in
> the engine. The request processing functions need to be changed
> to detect the encoding.
> 
> The main disadvantages are:
> - the lack of flexibility, encoding must be set before the script gets
>   the hand, using vhost config or htaccess
> - the possible bad encoding detection will force the user to manually
>   parse the raw request (when available).
> 
> 
> Solution #2: add (true) JIT support for GET/POST/COOKIE/...
> ------------
> Instead of doing all the precessing during the init phase, it will be
> done on demand when a input variable is requested, at runtime.
> 
> * Init
>  - don't parse the request but simply store it for later processing
> 
> * Runtime
>  - when a input variable is fetched:
>  - encoding is defined using unicode.request_encoding
>  - filter/decode/register the complete array (post,get,...)
> 
> The way JIT works has to be changed. It has to process the data
> at runtime instead of register them at compile time. This is the only
> way to be sure that the users has set the input encoding correctly
> (or has the opportunity to set it).
> 
> The main advantage of this solution is the absence of magic for
> the user. The encoding detection can be checked and/or set in time
> by the user before the  input processing, it is safe and flexible.
> 
> I would also suggest to add a function: filter_input_encoding($type) to
> define the encoding type at runtime instead of using ini_set (which is
> often disabled).
> 
> There is no real technical disadvantages but requires more work and
> changes in the engine. But these changes will also bring some more
> performance improvements (if (0) $t = $_ENV['foo']; will not trigger
> jit).
> 
> --
> 
> I would like to hear your ideas, opinions and comments. Especially
> about the possible changes in the engine. Feel free to ask more
> details if my explanations were unclear :)
> 
> Regards,
> --Pierre

-- 
Rui Hirokawa <[EMAIL PROTECTED]>

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to