Hi, I am a Turkish self-taught python user. Personally, I don't think I am in a position to discuss a issue in this scale. But in my opinion, I think pardus* developers should be invited to join to this discussion. As they are using python heavily on most of their projects** I think they would have something valueable to say about this subject. Here is the pardus-developers mailing list : http://liste.pardus.org.tr/mailman/listinfo/pardus-devel
And as for me, I always expect Turkish locale might cause problems, and use some workarounds if neccessary. For example, If I needed to match lower-case or upper-case Turkish "i", I would probably go with [iİ] with unicode flag. *) a linux distro developed by Scientific & Technological Research Council of Turkey **) http://developer.pardus.org.tr/projects/index.html 2011/9/15 MRAB <pyt...@mrabarnett.plus.com> > On 15/09/2011 14:44, John-John Tedro wrote: > >> On Thu, Sep 15, 2011 at 1:16 PM, Alan Plum <m...@alanplum.com >> <mailto:m...@alanplum.com>> wrote: >> >> On 2011-09-15 15:02, MRAB wrote: >> >> The regex module at >> http://pypi.python.org/pypi/__**regex<http://pypi.python.org/pypi/__regex> >> >> <http://pypi.python.org/pypi/**regex<http://pypi.python.org/pypi/regex>> >> currently uses a >> compromise, where it matches 'I' with 'i' and also 'I' with 'ı' >> and 'İ' >> with 'i'. >> >> I was wondering if it would be preferable to have a TURKIC flag >> instead >> ("(?T)" or "(?T:...)" in the pattern). >> >> >> I think the problem many people ignore when coming up with solutions >> like this is that while this behaviour is pretty much unique for >> Turkish script, there is no guarantee that Turkish substrings won't >> appear in other language strings (or vice versa). >> >> For example, foreign names in Turkish are often given as spelled in >> their native (non-Turkish) script variants. Likewise, Turkish names >> in other languages are often given as spelled in Turkish. >> >> The Turkish 'I' is a peculiarity that will probably haunt us >> programmers until hell freezes over. Unless Turkey abandons its >> traditional orthography or people start speaking only a single >> language at a time (including names), there's no easy way to deal >> with this. >> >> In other words: the only way to make use of your proposed flag is if >> you have a fully language-tagged input (e.g. an XML document making >> extensive use of xml:lang) and only ever apply regular expressions >> to substrings containing one culture at a time. >> >> -- >> >> http://mail.python.org/__**mailman/listinfo/python-list<http://mail.python.org/__mailman/listinfo/python-list> >> >> <http://mail.python.org/**mailman/listinfo/python-list<http://mail.python.org/mailman/listinfo/python-list> >> > >> >> >> Python does not appear to support special cases mapping, in effect, it >> is not 100% compliant with the unicode standard. >> >> The locale specific 'i' casing in Turkic is mentioned in 5.18 (Case >> Mappings <http://www.unicode.org/**versions/Unicode6.0.0/ch05.** >> pdf#G21180 <http://www.unicode.org/versions/Unicode6.0.0/ch05.pdf#G21180> >> >) >> >> of the unicode standard. >> http://www.unicode.org/**versions/Unicode6.0.0/ch05.**pdf#G21180<http://www.unicode.org/versions/Unicode6.0.0/ch05.pdf#G21180> >> >> AFAIK, the case methods of python strings seems to be built around the >> assumption that len("string") == len("string".upper()), but some of >> these casing rules require that the string grow. Like uppercasing of the >> german sharp s "ß" which should be translated to the expanded string "SS". >> These special cases should be triggered on specific locales, but I have >> not been able to verify that the Turkic uppercasing of "i" works on >> either python 2.6, 2.7 or 3.1: >> >> locale.setlocale(locale.LC_**ALL, "tr_TR.utf8") # warning, requires >> turkish locale on your system. >> ord("i".upper()) == 0x130 # is False for me, but should be True >> >> I wouldn't be surprised if these issues are translated into the 're' >> module. >> >> There has been some discussion on the Python-dev list about improving > Unicode support in Python 3. > > It's somewhat unlikely that Unicode will become locale-dependent in > Python because it would cause problems; you don't want: > > "i".upper() == "I" > > to be maybe true, maybe false. > > An option would be to specify whether it should be locale-dependent. > > > The only support appears to be 'L' switch, but it only makes "\w, \W, >> \b, \B, \s and \S dependent on the current locale". >> > > That flag is for locale-dependent 8-bit encodings. The ASCII (Python > 3), LOCALE and UNICODE flags are mutually exclusive. > > > Which probably does not yield to the special rules mentioned above, but >> I could be wrong. Make sure that your locale is correct and test again. >> >> If you are unsuccessful, I don't see a 'Turkic flag' being introduced >> into re module any time soon, given the following from PEP 20 >> "Special cases aren't special enough to break the rules" >> >> That's why I'm interested in the view of Turkish users. The rest of us > will probably never have to worry about it! :-) > > (There's a report in the Python bug tracker about this issue, which is > why the regex module has the compromise.) > > -- > http://mail.python.org/**mailman/listinfo/python-list<http://mail.python.org/mailman/listinfo/python-list> > -- http://yasar.serveblog.net/
-- http://mail.python.org/mailman/listinfo/python-list