Hi Vladimir,

Thank you for the proposed patch.

> As already reported several years ago

I cannot find it in my archives. Maybe that discussion already contained
some useful thoughts or arguments? Can you please point me to it?

> argp counts bytes even when
> actually what matters is the display length. This patch improves the
> situation by counting only leading and standalone UTF-8 bytes. It
> doesn't handle the double-width characters like Chinese sinograms

A program that needs to consider display length - for example for
line wrapping - should
  1) work with any locale encoding. Don't assume that the locale encoding
     is UTF-8.
  2) work with Chinese ideographs correctly, like it should also work
     with Russian (single-width) letters.

The easiest way to satisfy these two requirements is to base the code on
either
  * the function mbswidth (gnulib module mbswidth) and possibly also mbiter
    or mbuiter, or
  * the gnulib module unilbrk/ulc-width-linebreaks, it contains a complete
    line-breaking algorithm.

Can you rewrite your patch to this effect?

Also, such tricky issues should be checked in the test suite. Can you
please also provide a test program, some input data, and the expected
output for this data? We can then turn it into a gnulib test.

Bruno


Reply via email to