Re: [PHP-DEV] Re: PHP Unicode support design document

Rasmus Lerdorf Mon, 15 Aug 2005 15:09:44 -0700

I think the main issue here is that if your script encoding is set to
UTF-8 and you do everything in UTF-8 then these large blocks of UTF-8
are going to make a UTF-8 -> UTF-16 -> UTF-8 conversion roundtrip on
every request.  It would be nice if we could somehow avoid that.


-Rasmus

Andi Gutmans wrote:
> Wouldn't it be easiest to have inline html become IS_UNICODE and then
> not deal with the problem of remember what the script encoding was? I
> thought that's what we already do today.
> 
> Andi
> 
> At 12:37 PM 8/10/2005 -0700, Andrei Zmievski wrote:
> 
>> I did not have time to write the full reply earlier so here goes.
>>
>> Even if we modify the output layer to be aware of various types of
>> strings coming down the pipe, it would still need to know the encoding
>> of IS_STRING's in order to convert them to the output encoding. This
>> presents a particular problem for inline HTML blocks, as they are
>> supposed to be in the script encoding, but by the time the HTML is
>> sent to the output layer, we don't know what the source script
>> encoding was for these HTML blocks. This problem exists in the current
>> implementation also, because the ZEND_ECHO opcode does not keep track
>> of what the script encoding was. This needs to be fixed, obviously.
>>
>> One approach could be to implement a separate opcode for inline HTML
>> blocks and store the name of the script encoding it came from in the
>> opcode. Then when the output layer (or whatever else) gets to it, we
>> can check the encoding name in the opcode vs. the output encoding and
>> perform transcoding if necessary. This does mean that we may need to
>> dynamically open and close converters on each output (if there were
>> different script encodings floating around), but can be alleviated by
>> keeping some sort of converter cache around.
>>
>> I am open to other ideas.
>>
>> -Andrei
>>
>> On Aug 10, 2005, at 8:34 AM, Andrei Zmievski wrote:
>>
>>> That's not true, actually. 'echo' and 'print' resolve to ZEND_ECHO
>>> opcode which calls zend_print_variable(), which in turn calls
>>> zend_make_printable_zval(). Now, this last function is supposed to
>>> take a zval and turn it into a printable string, of course, which is
>>> then output using utility_functions->write_function aka
>>> php_body_write(). All that function cares about is how to output a
>>> binary string. So, if we want to bubble the conversion down to the
>>> output layer, we probably need to change the write function so that
>>> it takes a void* and a type and knows how to deal with them
>>> appropriately.
> 
> 

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Re: PHP Unicode support design document

Reply via email to