Hi!


I'm currently developing a nice script that generates OpenOffice SXW files
by filling the content.xml (which is UTF-8 encoded) with database content.
While trying to do this I found out that utf8_encode('"') (charcode 147)
returns '"'. But when I checked the whole result in OffenOffice '"' is
displayed as square (character unknown?!). So I made some tests with UTF-8
conversion (even mb_* functions) and recognized that characters between 128
and 160 returned by utf8_encode() don't seem to match the standard. As
mentioned above '"' is returned as '"' but should be '?T' (as you will get
it using UltraEdit for conversion).



Does anyone can give me some explanations here?



I'm not familiar with this UTF-8 / bit-conversion stuff, but I don't think
PHP does what it's supposed to do here. For a first workaround I simply
coded a custom_utf8_encode() that uses an own char map to override this
misbehaviour (see below). Can someone help my out with this strange bug?!



Regards

Bjoern Kraus





function custom_utf8_encode($str)

{

    $chrMap = array(128 => ',', 129 => '',  130 => '?s', 131 => ''',

                    132 => '?z', 133 => '?', 134 => '? ', 135 => '?',

                    136 => '?',  137 => '?', 138 => ' ',  139 => '?',

                    140 => ''',  141 => '',  142 => 'Ž',  143 => '',

                    144 => '',  145 => '?~', 146 => '?T', 147 => '?o',

                    148 => '?', 149 => '?', 150 => '?"', 151 => '?"',

                    152 => 'o',  153 => '"', 154 => 'š',  155 => '?',

                    156 => '"',  157 => '',  158 => 'ž',  159 => 'Ÿ');



    $newStr = '';



    for ($i = 0; $i < strlen($str); $i++) {

        $chrVal = ord($str[$i]);

        if ($chrVal > 127 && $chrVal < 160) {

            $newStr .= $chrMap[$chrVal];

        }

        else {

            $newStr .= utf8_encode($str[$i]);

        }

    }



    return $newStr;

}

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to