ID: 35241
User updated by: mikx at mikx dot de
Reported By: mikx at mikx dot de
Status: Bogus
Bug Type: WDDX related
Operating System: Linux, Windows
PHP Version: 5CVS-2005-11-16 (snap)
New Comment:
Thanks for that info. But why does this mean it is not a bug? Is
decoding to Latin1 expected behavior or just a side effect? Can the
default encoding of libxml2 be influenced? Will this become a
regression if PHP will ever properly use utf-8 anywhere in the engine?
Previous Comments:
------------------------------------------------------------------------
[2005-11-28 17:55:12] [EMAIL PROTECTED]
It's different because now we use libxml2 instead of the old expat.
------------------------------------------------------------------------
[2005-11-28 17:27:00] mikx at mikx dot de
This bug is not bogus in my oppionion (re-opening). WDDX deserialize
isn't able to properly decode a valid utf-8 encoded and marked WDDX
packet coming from another source (or written with a plain utf-8 text
editor if you want).
If i am wrong and this is expected behavior please give me a link to
the documentation saying that an implicit conversion to latin1 is
expected behavior. And please explain why and i which version of PHP
this behavior changed - in PHP 4.3.9 it is different.
------------------------------------------------------------------------
[2005-11-17 17:45:24] mikx at mikx dot de
Ilia, the data i have is already utf8 encoded inside the database. And
as output 5 of 6 shows in my testcase even if i specify an utf-8 xml
header on a valid utf-8 encoded packet wddx_deserialize automaticly
decodes the data to latin1.
This has nothing to do with wddx_serialize directly, but of course:
double encoding something already in utf8 again would work if i only
serialize and deserialize in php5. But it would produce a corrupted,
double-utf-8-encoded wddx file not properly working with other wddx
tools.
Currently wddx_deserialize adds an utf8_decode on everything not cleary
marked as being already latin1 - therefore wddx_deserialize has a bug
since it is not capable of properly decoding a valid utf8 encoded WDDX
packet to an UTF-8 string.
Well, or at least it is nowhere documented properly how to influence
the behavior of wddx_deserialize.
------------------------------------------------------------------------
[2005-11-17 16:49:09] [EMAIL PROTECTED]
To handle UTF data you need to use utf8_encode() function on the data
itself and add xml header identifying the data as being UTF8.
------------------------------------------------------------------------
[2005-11-16 16:07:59] mikx at mikx dot de
Tried the snapshot for Windows you linked to (PHP Version
5.1.0RC5-dev). Result for the testcase is exactly the same as with
5.0.5.
------------------------------------------------------------------------
The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/35241
--
Edit this bug report at http://bugs.php.net/?id=35241&edit=1