Re: Endianness-specific

Bruno Haible Sat, 06 Oct 2007 11:22:47 -0700

Hi Ludovic,

> I'm trying to implement functions that convert a string in the current
> locale encoding to its UTF-{16,32} representation, for a given
> endianness.


This kind of task is outside of the scope of the uniconv/* modules.
'unistr' and 'uniconv' deal wih UTF-{8,16,32} as an internal representation
of strings in memory; therefore they assume machine-dependent endianness
and alignment - and therefore can access every unit in a single memory
access.

If the endianness or alignment is different, the code needs to access
every unit byte after byte; this is not the way it's done in the 'unistr'
and 'uniconv' libraries.

Therefore I would recommend to use the mem_cd_iconveh function from the
'striconveh' module, with FROMCODE = locale_charset() and TOCODE =
"UTF-16BE" or "UTF-16LE" (or vice versa). Or mem_iconveh you don't
want to reuse the conversion descriptors.

The str_cd_iconveh and str_iconveh functions are not usable here because they
look for the end of string via strlen().

I recommend the 'striconveh' module here over the 'striconv' module, because
it will work even with Solaris iconv() which can convert from anything to
UTF-8 and vice versa, but cannot convert directly e.g. between ISO-8859-2
and UTF-16LE. The 'striconveh' module does the conversion in two steps in
such a case.

Bruno

Re: Endianness-specific

Reply via email to