On Fri, Nov 21, 2025 at 11:04:36PM +0000, Gavin Smith wrote:
> texi2any gives encoding errors output for some of the glyph commands on
> a Solaris 11 system.  This is the case for @expansion{}.  I reduced it
> to the following input:
> 
> Then, when I run '../tta/perl/texi2any.pl --html test.texi --no-split 
> --force',
> I get the following errors:
> 
> C:encoding error at byte 0xe2
> C:encoding error at byte 0x86
> C:encoding error at byte 0xa6
> 
> 0xe2 0x86 0xa6 is the UTF-8 for U+21A6 (↦), the Unicode character output for
> @expansion{}.  The error messages are printed in the 'encode_with_iconv'
> function in tta/C/main/utils.c.

> It appears to be from the use of the "us-ascii//TRANSLIT" encoding in
> 'unicode_to_transliterate' in main/node_name_normalization.c.  My
> guess is that this system either doesn't have such an encoding or doesn't
> support some characters for transliteration.

The issue you unveiled with transliteration in C is more general than
what you experienced with @expansion on sectioning command line.  In
every situation where transliteration happens, there will be encoding
errors output and the output will be suboptimal for some characters
corresponding to @-commands.  I actually do not think that a suboptimal
output is so bad, because there should not be any case where
transliteration needs to be externally consistent.  But the error
messages are definitively bad to me.

One possibility could be to ignore the encoding errors when calling
encode_with_iconv from unicode_to_transliterate.  That would actually
have my preference.

Another possibility would be to design configure tests that test whether
transliteration with "us-ascii//TRANSLIT" enoding gives correct
results/doesn't set EILSEQ.  This is not my preference because it is
more complex, and also because the test would necessarily be a run test,
and when cross-compiling the Perl transliteration would always be
chosen.

-- 
Pat

Reply via email to