Re: Why __slots__ slows down attribute access?
On Tue, Aug 23, 2011 at 12:26 PM, Peter Otten <__pete...@web.de> wrote: > Jack wrote: > > > People have illusion that it is faster to visit the attribute defined > > by __slots__ . > > http://groups.google.com/group/comp.lang.python/msg/c4e413c3d86d80be > > > > That is wrong. The following tests show it is slower. > > Not so fast. Here's what I get (python2.6.4, 64 bit): > > $ python -mtimeit -s "class A(object): __slots__ = ('a', 'b', 'c')" -s > "inst = A()" "inst.a=5; inst.b=6; inst.c=7" > 100 loops, best of 3: 0.324 usec per loop > > $ python -mtimeit -s "class A(object): pass" -s "inst = A()" "inst.a=5; > inst.b=6; inst.c=7" > 100 loops, best of 3: 0.393 usec per loop > > Now what? > > -- > http://mail.python.org/mailman/listinfo/python-list > This is what I get on a 64 bit Linux 2.6.39 script: for v in 2.6 2.7 3.2; do python$v --version echo -n "(slots) = "; python$v -mtimeit -s "class A(object): __slots__ = ('a', 'b', 'c')" -s "inst = A()" "inst.a=5; inst.b=6; inst.c=7"; echo -n "(regular) = "; python$v -mtimeit -s "class A(object): pass" -s "inst = A()" "inst.a=5; inst.b=6; inst.c=7"; done output: Python 2.6.5 (slots) = 100 loops, best of 3: 0.219 usec per loop (regular) = 100 loops, best of 3: 0.231 usec per loop Python 2.7.2 (slots) = 100 loops, best of 3: 0.244 usec per loop (regular) = 100 loops, best of 3: 0.285 usec per loop Python 3.2 (slots) = 100 loops, best of 3: 0.193 usec per loop (regular) = 100 loops, best of 3: 0.224 usec per loop -- John-John Tedro -- http://mail.python.org/mailman/listinfo/python-list
Re: How do I automate the removal of all non-ascii characters from my code?
On Mon, Sep 12, 2011 at 8:17 AM, Gary Herron wrote: > On 09/12/2011 12:49 AM, Alec Taylor wrote: > >> Good evening, >> >> I have converted ODT to HTML using LibreOffice Writer, because I want >> to convert from HTML to Creole using python-creole. Unfortunately I >> get this error: "File "Convert to Creole.py", line 17 >> SyntaxError: Non-ASCII character '\xe2' in file Convert to Creole.py >> on line 18, but no encoding declared; see >> http://www.python.org/peps/**pep-0263.html<http://www.python.org/peps/pep-0263.html>for >> details". >> >> Unfortunately I can't post my document yet (it's a research paper I'm >> working on), but I'm sure you'll get the same result if you write up a >> document in LibreOffice Writer and add some End Notes. >> >> How do I automate the removal of all non-ascii characters from my code? >> >> Thanks for all suggestions, >> >> Alec Taylor >> > > > > This question does not quite make sense. The error message is complaining > about a python file. What does that file have to do with ODT to HTML > conversion and LibreOffice? > > The error message means the python file (wherever it came from) has a > non-ascii character (as you noted), and so it needs something to tell it > what such a character means. (That what the encoding is.) > > A comment like this in line 1 or 2 will specify an encoding: > # -*- coding: -*- > but, we'll have to know more about the file "Convert to Creole.py" to guess > what encoding name should be specified there. > > You might try utf-8 or latin-1. > > > > -- > http://mail.python.org/**mailman/listinfo/python-list<http://mail.python.org/mailman/listinfo/python-list> > If you are having trouble figuring out which encoding your file has, the "file" util is often a quick and dirty solution. #> echo "åäö" > test.txt #> file test.txt test.txt: UTF-8 Unicode text #> iconv test.txt -f utf-8 -t latin1 > test.l1.txt #> file test.l1.txt test.l1.txt: ISO-8859 text Note: I use latin1 (iso-8859-1) because it can describe the characters 'å', 'ä', 'ö'. Your encoding might be different depending on what system you are using. The gist is that if you specify the correct encoding as mentioned above with the "coding"-comment, your program will probably (ish) run as intended. -- John-John Tedro -- http://mail.python.org/mailman/listinfo/python-list
Re: Turkic I and re
On Thu, Sep 15, 2011 at 1:16 PM, Alan Plum wrote: > On 2011-09-15 15:02, MRAB wrote: > >> The regex module at >> http://pypi.python.org/pypi/**regex<http://pypi.python.org/pypi/regex>currently >> uses a >> compromise, where it matches 'I' with 'i' and also 'I' with 'ı' and 'İ' >> with 'i'. >> >> I was wondering if it would be preferable to have a TURKIC flag instead >> ("(?T)" or "(?T:...)" in the pattern). >> > > I think the problem many people ignore when coming up with solutions like > this is that while this behaviour is pretty much unique for Turkish script, > there is no guarantee that Turkish substrings won't appear in other language > strings (or vice versa). > > For example, foreign names in Turkish are often given as spelled in their > native (non-Turkish) script variants. Likewise, Turkish names in other > languages are often given as spelled in Turkish. > > The Turkish 'I' is a peculiarity that will probably haunt us programmers > until hell freezes over. Unless Turkey abandons its traditional orthography > or people start speaking only a single language at a time (including names), > there's no easy way to deal with this. > > In other words: the only way to make use of your proposed flag is if you > have a fully language-tagged input (e.g. an XML document making extensive > use of xml:lang) and only ever apply regular expressions to substrings > containing one culture at a time. > > -- > http://mail.python.org/**mailman/listinfo/python-list<http://mail.python.org/mailman/listinfo/python-list> > Python does not appear to support special cases mapping, in effect, it is not 100% compliant with the unicode standard. The locale specific 'i' casing in Turkic is mentioned in 5.18 (Case Mappings<http://www.unicode.org/versions/Unicode6.0.0/ch05.pdf#G21180>) of the unicode standard. http://www.unicode.org/versions/Unicode6.0.0/ch05.pdf#G21180 AFAIK, the case methods of python strings seems to be built around the assumption that len("string") == len("string".upper()), but some of these casing rules require that the string grow. Like uppercasing of the german sharp s "ß" which should be translated to the expanded string "SS". These special cases should be triggered on specific locales, but I have not been able to verify that the Turkic uppercasing of "i" works on either python 2.6, 2.7 or 3.1: locale.setlocale(locale.LC_ALL, "tr_TR.utf8") # warning, requires turkish locale on your system. ord("i".upper()) == 0x130 # is False for me, but should be True I wouldn't be surprised if these issues are translated into the 're' module. The only support appears to be 'L' switch, but it only makes "\w, \W, \b, \B, \s and \S dependent on the current locale". Which probably does not yield to the special rules mentioned above, but I could be wrong. Make sure that your locale is correct and test again. If you are unsuccessful, I don't see a 'Turkic flag' being introduced into re module any time soon, given the following from PEP 20 "Special cases aren't special enough to break the rules" Cheers, -- John-John Tedro -- http://mail.python.org/mailman/listinfo/python-list
Fwd: Turkic I and re
On Fri, Sep 16, 2011 at 7:25 AM, Steven D'Aprano < steve+comp.lang.pyt...@pearwood.info> wrote: > Thomas Rachel wrote: > > > Am 15.09.2011 15:16 schrieb Alan Plum: > > > >> The Turkish 'I' is a peculiarity that will probably haunt us programmers > >> until hell freezes over. > > > Meh, I don't think it's much more peculiar that any other diacritic issue. > If I'm German or English, I probably want ö and O to match during > case-insensitive comparisons, so that Zöe and ZOE match. If I'm Icelandic, > I don't. I don't really see why Turkic gets singled out. > > > > That's why it would have been nice if the Unicode guys had defined "both > > Turkish i-s" at separate codepoints. > > > > Then one could have the three pairs > > I, i ("normal") > > I (other one), ı > > > > and > > > > İ, i (the other one). > > And then people will say, "How can I match both sorts of dotless uppercase > I > but not dotted I when I'm doing comparisons?" > > > > -- > Steven > > -- > http://mail.python.org/mailman/listinfo/python-list > Yeah, it's more probable that language conventions and functions grow around characters that look right. No one except developers care what specific codepoint they have, so soon you would have a mish-mash of special rules converting between each special case. P.S. Sorry Steven, i missed clicking "reply to all". -- John-John Tedro -- http://mail.python.org/mailman/listinfo/python-list