On 08/02/2017 08:15 PM, Rob Clark wrote: > On Wed, Aug 2, 2017 at 1:05 PM, Heinrich Schuchardt <xypron.g...@gmx.de> > wrote: >> On 08/02/2017 11:38 AM, Rob Clark wrote: >>> On Tue, Aug 1, 2017 at 10:22 PM, Heinrich Schuchardt <xypron.g...@gmx.de> >>> wrote: >>>> On 07/31/2017 02:42 PM, Rob Clark wrote: >>>>> This is convenient for efi_loader which deals a lot with utf16. >>>>> >>>>> Signed-off-by: Rob Clark <robdcl...@gmail.com> >>>>> --- >>>>> lib/vsprintf.c | 39 +++++++++++++++++++++++++++++++++++++-- >>>>> 1 file changed, 37 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c >>>>> index 874a2951f7..84e157ecb1 100644 >>>>> --- a/lib/vsprintf.c >>>>> +++ b/lib/vsprintf.c >>>>> @@ -270,6 +270,35 @@ static char *string(char *buf, char *end, char *s, >>>>> int field_width, >>>>> return buf; >>>>> } >>>>> >>>>> +static size_t strnlen16(const u16* s, size_t count) >>>>> +{ >>>>> + const u16 *sc; >>>>> + >>>>> + for (sc = s; count-- && *sc; ++sc) >>>>> + /* nothing */; >>>>> + return sc - s; >>>>> +} >>>>> + >>>>> +static char *string16(char *buf, char *end, u16 *s, int field_width, >>>>> + int precision, int flags) >>>>> +{ >>>>> + int len, i; >>>>> + >>>>> + if (s == NULL) >>>>> + s = L"<NULL>"; >>>> >>>> The L notation creates a wchar_t string. The width of wchar_t depends on >>>> gcc compiler flag -fshort-wchar. >>>> >>>> vsprintf.c is not compiled with -fshort-wchar. So change this to >>>> >>>> const u16 null[] = { '<', 'N', 'U', 'L', 'L', '>', 0}; >>>> s = null; >>> >>> oh, I have another patch that adds -fshort-wchar globally.. which I >>> probably should have split out and sent with this. >>> >>> The problem is we cannot mix objects using short-wchar and ones that >>> don't without a compiler warning. Travis would complain a lot more >>> but I guess BOOTEFI_HELLO is not normally enabled. >>> >>> With addition of efi_bootmgr.c we really want to be able to use >>> L"string" to be u16.. and I don't think u-boot has any good reason to >>> use 32b wchar. >>> >>> But maybe for this code I should use wchar_t instead of u16. >>> >>> BR, >>> -R >> >> ext4 filenames may contain letters with Unicode values > 2**16, >> e.g. using Takri letters: 𑚀𑚁𑚂 >> >> So ext4ls probably should be enabled to display these on a Unicode console. >> >> Using -fshort-wchar globally is not necessary. Only UEFI requires 16 bit >> wchar_t. We should rather not enforce the UEFI standard on the rest of >> the code. > > The alternative is disabling a gcc warning about mixing 32b and 16b > wchar.. and really mixing 32b and 16b wchar seems like a bad idea. > > We could use -fshort-wchar only if EFI_LOADER is enabled. Technically > if we are a UEFI implementation, we do not need to have ext2/ext4 (or > really anything other than fat/vfat).
You can avoid the problem of variable width wchar by using constants starting with u (e.g. u"Hello world") which are char16_t (introduced with C11, #include <uchar.h>) and converting to utf-8 for console output. This way we do not need -fshort-wchar at all. Best regards Heinrich > >>> >>>>> + >>>>> + len = strnlen16(s, precision); >>>>> + >>>>> + if (!(flags & LEFT)) >>>>> + while (len < field_width--) >>>>> + ADDCH(buf, ' '); >>>>> + for (i = 0; i < len; ++i) >>>>> + ADDCH(buf, *s++); >> >> I would prefer to see a conversion to UTF-8 here. >> >> Conversion from 32bit Unicode (Or the capped 16bit Unicode of EFI) is >> quite easy. This is what I used in another project: >> >> uint32_t u = s[i]; >> char c[5]; >> if (u < 0x80) { >> c[0] = u & 0x7F; >> c[1] = 0; >> str.append(c); >> } else if (u < 0x800) { >> c[1] = 0x80 | (u & 0x3F); >> u >>= 6; >> c[0] = 0xC0 | (u & 0x1F); >> c[2] = 0; >> str.append(c); >> } else if (u < 0x10000) { >> c[2] = 0x80 | (u & 0x3F); >> u >>= 6; >> c[1] = 0x80 | (u & 0x3F); >> u >>= 6; >> c[0] = 0xE0 | (u & 0x0F); >> c[3] = 0; >> str.append(c); >> } else if (u < 0x200000) { >> c[3] = 0x80 | (u & 0x3F); >> u >>= 6; >> c[2] = 0x80 | (u & 0x3F); >> u >>= 6; >> c[1] = 0x80 | (u & 0x3F); >> u >>= 6; >> c[0] = 0xF0 | (u & 0x07); >> c[4] = 0; >> str.append(c); >> } else { >> throw invalid; >> } > > I did add a utf16_to_utf8() (based on code from grub) as part of the > efi-variables patch, since there we are dealing with utf16 coming from > outside of grub. I guess I could use that. I think that mostly > matters if we end up printing strings that originate outside of > u-boot, but I guess that will be the case for filenames in a > device-path. > > BR, > -R > >> Best regards >> >> Heinrich >> >>>>> + while (len < field_width--) >>>>> + ADDCH(buf, ' '); >>>>> + return buf; >>>>> +} >>>>> + >>>>> #ifdef CONFIG_CMD_NET >>>>> static const char hex_asc[] = "0123456789abcdef"; >>>>> #define hex_asc_lo(x) hex_asc[((x) & 0x0f)] >>>>> @@ -528,8 +557,14 @@ repeat: >>>>> continue; >>>>> >>>>> case 's': >>>>> - str = string(str, end, va_arg(args, char *), >>>>> - field_width, precision, flags); >>>>> + if (qualifier == 'l') { >>>> >>>> According to ISO 9899:1999 %ls is used to indicate a wchar_t string, >>>> which may be u32 * or u16 * depending on GCC flag -fshort-wchar. >>>> >>>> Wouldn't it make sense to use some other notation, e.g. %S, to indicate >>>> that we explicitly mean u16 *? >>>> >>>> Please, add a comment into the code indicating why we need u16 * support >>>> referring to the UEFI spec. >>>> >>>> Best regards >>>> >>>> Heinrich >>>> >>>>> + str = string16(str, end, va_arg(args, u16 >>>>> *), >>>>> + field_width, precision, >>>>> flags); >>>>> + >>>>> + } else { >>>>> + str = string(str, end, va_arg(args, char *), >>>>> + field_width, precision, flags); >>>>> + } >>>>> continue; >>>>> >>>>> case 'p': >>>>> >>>> >>> >> > _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot