On May 13, 11:44 am, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > PEP 1 specifies that PEP authors need to collect feedback from the > community. As the author of PEP 3131, I'd like to encourage comments > to the PEP included below, either here (comp.lang.python), or to > [EMAIL PROTECTED] > > In summary, this PEP proposes to allow non-ASCII letters as > identifiers in Python. If the PEP is accepted, the following > identifiers would also become valid as class, function, or > variable names: Löffelstiel, changé, ошибка, or 売り場 > (hoping that the latter one means "counter"). > > I believe this PEP differs from other Py3k PEPs in that it really > requires feedback from people with different cultural background > to evaluate it fully - most other PEPs are culture-neutral. > > So, please provide feedback, e.g. perhaps by answering these > questions: > - should non-ASCII identifiers be supported? why? > - would you use them if it was possible to do so? in what cases? > > Regards, > Martin > > PEP: 3131 > Title: Supporting Non-ASCII Identifiers > Version: $Revision: 55059 $ > Last-Modified: $Date: 2007-05-01 22:34:25 +0200 (Di, 01 Mai 2007) $ > Author: Martin v. Löwis <[EMAIL PROTECTED]> > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 1-May-2007 > Python-Version: 3.0 > Post-History: > > Abstract > ======== > > This PEP suggests to support non-ASCII letters (such as accented > characters, Cyrillic, Greek, Kanji, etc.) in Python identifiers. > > Rationale > ========= > > Python code is written by many people in the world who are not familiar > with the English language, or even well-acquainted with the Latin > writing system. Such developers often desire to define classes and > functions with names in their native languages, rather than having to > come up with an (often incorrect) English translation of the concept > they want to name. > > For some languages, common transliteration systems exist (in particular, > for the Latin-based writing systems). For other languages, users have > larger difficulties to use Latin to write their native words. > > Common Objections > ================= > > Some objections are often raised against proposals similar to this one. > > People claim that they will not be able to use a library if to do so > they have to use characters they cannot type on their keyboards. > However, it is the choice of the designer of the library to decide on > various constraints for using the library: people may not be able to use > the library because they cannot get physical access to the source code > (because it is not published), or because licensing prohibits usage, or > because the documentation is in a language they cannot understand. A > developer wishing to make a library widely available needs to make a > number of explicit choices (such as publication, licensing, language > of documentation, and language of identifiers). It should always be the > choice of the author to make these decisions - not the choice of the > language designers. > > In particular, projects wishing to have wide usage probably might want > to establish a policy that all identifiers, comments, and documentation > is written in English (see the GNU coding style guide for an example of > such a policy). Restricting the language to ASCII-only identifiers does > not enforce comments and documentation to be English, or the identifiers > actually to be English words, so an additional policy is necessary, > anyway. > > Specification of Language Changes > ================================= > > The syntax of identifiers in Python will be based on the Unicode > standard annex UAX-31 [1]_, with elaboration and changes as defined > below. > > Within the ASCII range (U+0001..U+007F), the valid characters for > identifiers are the same as in Python 2.5. This specification only > introduces additional characters from outside the ASCII range. For > other characters, the classification uses the version of the Unicode > Character Database as included in the ``unicodedata`` module. > > The identifier syntax is ``<ID_Start> <ID_Continue>*``. > > ``ID_Start`` is defined as all characters having one of the general > categories uppercase letters (Lu), lowercase letters (Ll), titlecase > letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers > (Nl), plus the underscore (XXX what are "stability extensions" listed in > UAX 31). > > ``ID_Continue`` is defined as all characters in ``ID_Start``, plus > nonspacing marks (Mn), spacing combining marks (Mc), decimal number > (Nd), and connector punctuations (Pc). > > All identifiers are converted into the normal form NFC while parsing; > comparison of identifiers is based on NFC. > > Policy Specification > ==================== > > As an addition to the Python Coding style, the following policy is > prescribed: All identifiers in the Python standard library MUST use > ASCII-only identifiers, and SHOULD use English words wherever feasible. > > As an option, this specification can be applied to Python 2.x. In that > case, ASCII-only identifiers would continue to be represented as byte > string objects in namespace dictionaries; identifiers with non-ASCII > characters would be represented as Unicode strings. > > Implementation > ============== > > The following changes will need to be made to the parser: > > 1. If a non-ASCII character is found in the UTF-8 representation of the > source code, a forward scan is made to find the first ASCII > non-identifier character (e.g. a space or punctuation character) > > 2. The entire UTF-8 string is passed to a function to normalize the > string to NFC, and then verify that it follows the identifier syntax. > No such callout is made for pure-ASCII identifiers, which continue to > be parsed the way they are today. > > 3. If this specification is implemented for 2.x, reflective libraries > (such as pydoc) must be verified to continue to work when Unicode > strings appear in ``__dict__`` slots as keys. > > References > ========== > > .. [1]http://www.unicode.org/reports/tr31/ > > Copyright > ========= > > This document has been placed in the public domain.
I don't think that supporting non-ascii characters for identifiers would cause any problem. Most people won't use it anyway. People who use non-english identifiers for their project and hope for it to be popular worldwide will probably just fail because of their foolish coding style policy choice. I put that kind of choice in the same ballpark as deciding to use hungarian notation for python code. As for malicious patch submission, I think this is a non issue. Designing tool to detect any non-ascii char identifier in a file should be a trivial script to write. I say that if there is a demand for it, let's do it. -- http://mail.python.org/mailman/listinfo/python-list