On 16 April 2017 at 19:16, Luc Danton wrote: > On 07/04/17 13:29 +0100, Jonathan Wakely wrote: >> >> + constexpr char __abc[] = "abcdefghijklmnopqrstuvwxyz"; >> + unsigned char __lc = std::tolower(__c); >> + constexpr bool __consecutive = __consecutive_chars(__abc, >> 26); >> + if _GLIBCXX17_CONSTEXPR (__consecutive) >> + { >> + // Characters 'a'..'z' are consecutive >> + if (std::isalpha(__c) && (__lc - 'a') < __b) >> + __c = __lc - 'a' + 10; > > <snip> > > Is that alright? I have my eyes set on the following line in particular: > >> + unsigned char __lc = std::tolower(__c); > > > Correct me if I'm wrong but that's locale sensitive. As it turns out in > a Turkish locale the lowercase for <I> is <ı> "U+0131 LATIN SMALL LETTER > DOTLESS I".
Good catch, thanks. > > I installed a Turkish locale on an old system of mine (Linux, UTF-8) and > sketched out the lines to quickly test the logic. As best as I can tell > the following happens when the global locale is set to Turkish: > > - __lc is 'I' unchanged because <ı> U+0131 would encode to two UTF-8 > codepoints aka can't fit in a char > - the test succeeds > - the computed value that ends up being written to __c is decimal -14 > > Keep in mind I didn't run the actual code from the patch, so I may have > got something wrong. Still, the Turkish i/İ and ı/I pairs are just one > example in one locale. Are std::tolower and std::isalpha the right tools > for this job? (These are the only two locale sensitive parts I > noticed in the code.) Yeah, I'll avoid isalpha and tolower.