I've noticed some different behavior between mbstring versions 4.2.2 and
4.3.9 -- both on RedHat 8 -- in terms of how internal encoding affects
the script.

In 4.2.2, the encoding translation appeared to work okay and would
convert Shift_JIS into UTF-8 on incoming requests.  We didn't try any
other encodings since this was our primary concern and worked well.  The
internal_encoding setting in the php.ini file was set to UTF-8.  Our
language file (very simple PHP array with values being the translated
text) was in Shift_JIS, and this was no problem to just send this to the
browser.    We display send the Shift_JIS language file entries to the
browser [via Smarty] as well as some other text that is stored in UTF-8
and run through mb_convert_encoding to convert it to Shift_JIS as well.
All in all, this works as expected.

Now, we're trying to upgrade to php4.3.9 and I can find no easy way to
get the Shift_JIS to work.... in the existing setup, it would just
return UTF-8 or garbled characters.  In other words, mb_convert_encoding
was not doing it's job, and it wouldn't even display the Shift_JIS
language file entries.   Manually converting the language file from
Shift_JIS characters to UTF-8 and then running all the elements through
mb_convert_encoding apparently did nothing as well -- unless I first
called mb_internal_encoding() and set that to Shift_JIS (likewise,
setting this in the php.ini file worked as well).  Then, the characters
would be displayed correctly in Shift_JIS.  I'm not sure if this is the
correct behavior though... it seems to me that the internal encoding
should almost always be UTF-8 and mb_convert_encoding should work
regardless of the internal encoding.  

I don't know the consequences of calling mb_internal_encoding at run
time and what that means to database interactions, curl interactions
[PEAR SOAP], etc.

A few other observations:
- compiling with --enable-zend-multibyte made no difference
- we compiled with --enable-mbstring=all and just --enable-mbstring,
which did not make a difference
- the mbstring.language php.ini setting didn't appear to make a
difference
- calling mb_internal_encoding('SJIS') was the only way to make
mb_convert_encoding($var, 'SJIS', 'UTF-8') work properly, otherwise
mb_convert_encoding just spit out garbage.
- http_input was set to UTF-8, SJIS, as was the detect_order.  
- the http_output ini setting was set to pass in the ini file, and
mb_output_handler is used at run-time with the preferred encoding
(either UTF-8 or SJIS)
- substitute character and function overloading are both off/disabled

Any suggestions?

Thanks,
Al Baker



Reply via email to