On 26/11/2015 13:15, Chris Angelico wrote:
On Thu, Nov 26, 2015 at 11:53 PM, BartC <b...@freeuk.com> wrote:
http://pastebin.com/JrVTher6
#14 and #15: Are you assuming that a character is a byte and that diacritical-free English is the only language in the world?
I don't think that need be the assumption. Any UTF8 string that fits within 8 bytes could also be represented by an integer value.
Case insensitivity is a *pain* when you try to be language-agnostic; for instance, the case-folding rules of English state that U+0069 LATIN SMALL LETTER I and U+0049 LATIN CAPITAL LETTER I are identical, but Turkish would upper-case the first to U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE and lower-case the second to U+0131 LATIN SMALL LETTER DOTLESS I. German has U+00DF LATIN SMALL LETTER SHARP S (also called eszett), which traditionally upper-cases to "SS", which lower-cases to "ss".
I use Windows which is also case insensitive with regard to filenames and such. How does it solve those problems? How about web-site names, email addresses and Google searches?
Within a program source code (where you have mainly technical users), you can just impose some restrictions on keywords and identifiers otherwise there are plenty of problems even without case switching, if you want to allow Unicode here.
-- Bartc -- https://mail.python.org/mailman/listinfo/python-list