Hi Vladimir, > > - The functions __argp_get_display_len and add_length don't write to > > the memory delimited by 'beg', 'end', 'ptr, 'end'. Therefore it is good > > style to declare these parameters as being 'const char *' rather than > > 'char *'. The general rule of thumb is: Use 'const char *' instead of > > 'char *' as long as it does not lead to warnings with "gcc -Wall" and > > does not force you to introduce casts. > > I'll have to change return types to offset instead of pointer to avoid > casts.
Yup, that is the right thing to do. > > - The function __argp_get_display_len looks very similar to mbsnwidth(), > > from module 'mbswidth'. Could you use that function? One of the gnulib > > principles is to reuse code that is already in gnulib, where it makes > > sense. > mbsnwidth returns plainly -1 in case of incorrect multibyte sequence. > This is unfortunately not uncommon that the .mo claims incorrect > encoding mbsnwidth returns -1 in such a case only if the option MBSW_REJECT_INVALID is passed as third argument. If you pass 0, mbsnwidth will not return -1; instead, it will assume width 1 for every invalid byte or unprintable character. > Perhaps we can use an approximation one byte = one column in case of > failure Yes, that's the common assumption. > > - The function __argp_get_display_len looks very similar to mbsnwidth(), > Remaining is the issue due to escape sequences. What is the use case? PO file editors are not required to support editing of strings with control characters. msgfmt warns when a message in a PO file contains an unusual control character like ESC. > it is used in en@boldquot Ah, right. But I don't know how frequently it is used; maybe I and Simon were the only persons to ever use this? If we want to support this, not only mbswidth has to be modified, but basically any code that uses wcwidth - including libunistring. So, until this is discussed (and possibly generalized to more languages than 'en'), I propose to get away without it. > Done but the test is valid only for UTF-8 locales. Should I force some > specific locale? It's impossible to make a test working in all locales > since in case of e.g. ASCII we don't have such characters at all. In such a situation, it is best to split the test into two parts: a part that can be executed on every machine, and a part which can only be executed on a system with a UTF-8 locale. This way, the first part is not skipped just because the system has no UTF-8 locale. Please take a look how it's done in module 'mbsstr-tests': - test-mbsstr1.c is a test that doesn't need a particular locale. - test-mbsstr2.c is a test that requires a UTF-8 locale. We use the French one for simplicity. (If a system does not have fr_FR.UTF-8 installed, it would be unlikely that it has ru_RU.UTF-8 installed.) - test-mbsstr2.sh is a wrapper script that uses the LOCALE_FR_UTF8 value, determined by m4/locale-fr.m4, and invokes test-mbsstr2. > I have the assignment covering any contribution to GRUB. So this should > go pretty quickly. Thanks for pursuing this. > Alternatively I can commit it in GRUB and you'll lift it from there. This would not be so ideal. > + if (wc == '\e' && ptr + 3 < end > + && ptr[1] == '[' && (ptr[2] == '0' || ptr[2] == '1') > + && ptr[3] == 'm') '\e' is not portable, only GCC supports it. Use '\x1b' or '\033' instead. Also, the test ptr + 3 < end is wrong. Should be written as end - ptr > 3 instead. (Think of ptr = 0xFFFFFFD, end = 0xFFFFFFFE on a 32-bit machine.) Sure, on many systems this won't matter, because this memory range is either unmapped or occupied by the stack. But in general you have no guarantee that the memory page from 0xFFFFC000..0xFFFFFFFF will not be used for malloc(). Bruno