On May 13, 5:44 pm, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> In summary, this PEP proposes to allow non-ASCII letters as > identifiers in Python. If the PEP is accepted, the following > identifiers would also become valid as class, function, or > variable names: Löffelstiel, changé, ошибка, or 売り場 > (hoping that the latter one means "counter"). I am strongly against this PEP. The serious problems and huge costs already explained by others are not balanced by the possibility of using non-butchered identifiers in non-ASCII alphabets, especially considering that one can write any language, in its full Unicode glory, in the strings and comments of suitably encoded source files. The diatribe about cross language understanding of Python code is IMHO off topic; if one doesn't care about international readers, using annoying alphabets for identifiers has only a marginal impact. It's the same situation of IRIs (a bad idea) with HTML text (happily Unicode). > - should non-ASCII identifiers be supported? why? No, they are useless. > - would you use them if it was possible to do so? in what cases? No, never. Being Italian, I'm sometimes tempted to use accented vowels in my code, but I restrain myself because of the possibility of annoying foreign readers and the difficulty of convincing every text editor I use to preserve them > Python code is written by many people in the world who are not familiar > with the English language, or even well-acquainted with the Latin > writing system. Such developers often desire to define classes and > functions with names in their native languages, rather than having to > come up with an (often incorrect) English translation of the concept > they want to name. The described set of users includes linguistically intolerant people who don't accept the use of suitable languages instead of their own, and of compromised but readable spelling instead of the one they prefer. Most "people in the world who are not familiar with the English language" are much more mature than that, even when they don't write for international readers. > The syntax of identifiers in Python will be based on the Unicode > standard annex UAX-31 [1]_, with elaboration and changes as defined > below. Not providing an explicit listing of allowed characters is inexcusable sloppiness. The XML standard is an example of how listings of large parts of the Unicode character set can be provided clearly, exactly and (almost) concisely. > ``ID_Start`` is defined as all characters having one of the general > categories uppercase letters (Lu), lowercase letters (Ll), titlecase > letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers > (Nl), plus the underscore (XXX what are "stability extensions" listed in > UAX 31). > > ``ID_Continue`` is defined as all characters in ``ID_Start``, plus > nonspacing marks (Mn), spacing combining marks (Mc), decimal number > (Nd), and connector punctuations (Pc). Am I the first to notice how unsuitable these characters are? Many of these would be utterly invisible ("variation selectors" are Mn) or displayed out of sequence (overlays are Mn), or normalized away (combining accents are Mn) or absurdly strange and ambiguous (roman numerals are Nl, for instance). Lorenzo Gatti -- http://mail.python.org/mailman/listinfo/python-list