On 16 April 2017 at 19:16, Luc Danton wrote:
> On 07/04/17 13:29 +0100, Jonathan Wakely wrote:
>>
>> +             constexpr char __abc[] = "abcdefghijklmnopqrstuvwxyz";
>> +             unsigned char __lc = std::tolower(__c);
>> +             constexpr bool __consecutive = __consecutive_chars(__abc,
>> 26);
>> +             if _GLIBCXX17_CONSTEXPR (__consecutive)
>> +               {
>> +                 // Characters 'a'..'z' are consecutive
>> +                 if (std::isalpha(__c) && (__lc - 'a') < __b)
>> +                   __c = __lc - 'a' + 10;
>
> <snip>
>
> Is that alright? I have my eyes set on the following line in particular:
>
>> +             unsigned char __lc = std::tolower(__c);
>
>
> Correct me if I'm wrong but that's locale sensitive. As it turns out in
> a Turkish locale the lowercase for <I> is <ı> "U+0131 LATIN SMALL LETTER
> DOTLESS I".

Good catch, thanks.

>
> I installed a Turkish locale on an old system of mine (Linux, UTF-8) and
> sketched out the lines to quickly test the logic. As best as I can tell
> the following happens when the global locale is set to Turkish:
>
>  - __lc is 'I' unchanged because <ı> U+0131 would encode to two UTF-8
>    codepoints aka can't fit in a char
>  - the test succeeds
>  - the computed value that ends up being written to __c is decimal -14
>
> Keep in mind I didn't run the actual code from the patch, so I may have
> got something wrong. Still, the Turkish i/İ and ı/I pairs are just one
> example in one locale. Are std::tolower and std::isalpha the right tools
> for this job? (These are the only two locale sensitive parts I
> noticed in the code.)

Yeah, I'll avoid isalpha and tolower.

Reply via email to