FWIW... At Fri, 23 Apr 2021 00:17:35 -0400, Tom Lane <t...@sss.pgh.pa.us> wrote in > Kyotaro Horiguchi <horikyota....@gmail.com> writes: > > At Thu, 22 Apr 2021 23:17:19 -0400, Tom Lane <t...@sss.pgh.pa.us> wrote in > >> Doesn't seem like a good idea, because that locks us into an assumption > >> that the downcasing conversion doesn't change the string's physical > >> length. There are a lot of counterexamples to that :-(. I'm not sure > > > Mmm. I didn't know of that. > > The two examples I know of offhand are in German (eszett "ß" downcases to > "ss") and Turkish (dotted "Í" downcases to "i", likewise dotless "I"
According to Wikipedia, "ss" is equivalent to "ß" and their upper case letters are "SS" and "ẞ" respectively. (I didn't even know of the existence of "ẞ". AFAIK there's no word begins with eszett, but it seems that there's a case where "ẞ" appears in a word is spelled only with capital letters. > downcases to "ı"; one of each of those pairs is an ASCII letter, the > other is not). Depending on which encoding is in use, these Upper dotless "I" and lower dotted "i" are in ASCII (or English alphabet?). That's interesting. > transformations *could* be the same number of bytes, but they could > equally well not be. There are probably other examples. Yeah. Agreed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center