I think #2 is better than #1. The current implementation of mbstring is based on the solution similar to #1. It is simple and stable, but, #2 has more flexibility.
Rui On Thu, 14 Dec 2006 21:59:44 +0100 Pierre <[EMAIL PROTECTED]> wrote: > Hello, > > Yesterday, Ilia, Andrei and I discussed the possible solutions to solve > the input encoding in php6 (unicode). I will try to describe them here. > > I do not go too deep in the details, the goal is to choose one > solution and then propose a patch to test. Our preference goes to > the solution #2. > > -- > Solution #1: > ------------ > The idea here is to detect encoding, encode and register the variable > during the request initialization (before the script gets the hand). > Besides the encoding detection, it is how it works in the actual > implementation (all php versions). > > * Init > - Parse the request into an array. > - locate _charset_ or use unicode.request_encoding > - filter/decode/register the variable like it is done now > > * Runtime > Just like now, the auto_globals (with or without jit) are declared and > ready to be used. > > This solution has one advantage, it requires only a few changes in > the engine. The request processing functions need to be changed > to detect the encoding. > > The main disadvantages are: > - the lack of flexibility, encoding must be set before the script gets > the hand, using vhost config or htaccess > - the possible bad encoding detection will force the user to manually > parse the raw request (when available). > > > Solution #2: add (true) JIT support for GET/POST/COOKIE/... > ------------ > Instead of doing all the precessing during the init phase, it will be > done on demand when a input variable is requested, at runtime. > > * Init > - don't parse the request but simply store it for later processing > > * Runtime > - when a input variable is fetched: > - encoding is defined using unicode.request_encoding > - filter/decode/register the complete array (post,get,...) > > The way JIT works has to be changed. It has to process the data > at runtime instead of register them at compile time. This is the only > way to be sure that the users has set the input encoding correctly > (or has the opportunity to set it). > > The main advantage of this solution is the absence of magic for > the user. The encoding detection can be checked and/or set in time > by the user before the input processing, it is safe and flexible. > > I would also suggest to add a function: filter_input_encoding($type) to > define the encoding type at runtime instead of using ini_set (which is > often disabled). > > There is no real technical disadvantages but requires more work and > changes in the engine. But these changes will also bring some more > performance improvements (if (0) $t = $_ENV['foo']; will not trigger > jit). > > -- > > I would like to hear your ideas, opinions and comments. Especially > about the possible changes in the engine. Feel free to ask more > details if my explanations were unclear :) > > Regards, > --Pierre -- Rui Hirokawa <[EMAIL PROTECTED]> -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php