Re: Why __slots__ slows down attribute access?

2011-08-23 Thread John-John Tedro
On Tue, Aug 23, 2011 at 12:26 PM, Peter Otten <__pete...@web.de> wrote:

> Jack wrote:
>
> > People have illusion that it is faster to visit the attribute defined
> > by __slots__ .
> > http://groups.google.com/group/comp.lang.python/msg/c4e413c3d86d80be
> >
> > That is wrong. The following tests show it is slower.
>
> Not so fast. Here's what I get (python2.6.4, 64 bit):
>
> $ python  -mtimeit -s "class A(object): __slots__ = ('a', 'b', 'c')" -s
> "inst = A()" "inst.a=5; inst.b=6; inst.c=7"
> 100 loops, best of 3: 0.324 usec per loop
>
> $ python  -mtimeit -s "class A(object): pass" -s "inst = A()" "inst.a=5;
> inst.b=6; inst.c=7"
> 100 loops, best of 3: 0.393 usec per loop
>
> Now what?
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

This is what I get on a 64 bit Linux 2.6.39

script:
for v in 2.6 2.7 3.2; do
  python$v --version
  echo -n "(slots)   = ";
  python$v -mtimeit -s "class A(object): __slots__ = ('a', 'b', 'c')" -s
"inst = A()" "inst.a=5; inst.b=6; inst.c=7";
  echo -n "(regular) = ";
  python$v -mtimeit -s "class A(object): pass" -s "inst = A()" "inst.a=5;
inst.b=6; inst.c=7";
done

output:
Python 2.6.5
(slots)   = 100 loops, best of 3: 0.219 usec per loop
(regular) = 100 loops, best of 3: 0.231 usec per loop
Python 2.7.2
(slots)   = 100 loops, best of 3: 0.244 usec per loop
(regular) = 100 loops, best of 3: 0.285 usec per loop
Python 3.2
(slots)   = 100 loops, best of 3: 0.193 usec per loop
(regular) = 100 loops, best of 3: 0.224 usec per loop

-- John-John Tedro
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How do I automate the removal of all non-ascii characters from my code?

2011-09-12 Thread John-John Tedro
On Mon, Sep 12, 2011 at 8:17 AM, Gary Herron wrote:

> On 09/12/2011 12:49 AM, Alec Taylor wrote:
>
>> Good evening,
>>
>> I have converted ODT to HTML using LibreOffice Writer, because I want
>> to convert from HTML to Creole using python-creole. Unfortunately I
>> get this error: "File "Convert to Creole.py", line 17
>> SyntaxError: Non-ASCII character '\xe2' in file Convert to Creole.py
>> on line 18, but no encoding declared; see
>> http://www.python.org/peps/**pep-0263.html<http://www.python.org/peps/pep-0263.html>for
>>  details".
>>
>> Unfortunately I can't post my document yet (it's a research paper I'm
>> working on), but I'm sure you'll get the same result if you write up a
>> document in LibreOffice Writer and add some End Notes.
>>
>> How do I automate the removal of all non-ascii characters from my code?
>>
>> Thanks for all suggestions,
>>
>> Alec Taylor
>>
>
>
>
> This question does not quite make sense.  The error message is complaining
> about a python file.  What does that file have to do with ODT to HTML
> conversion and LibreOffice?
>
> The error message means the python file (wherever it came from) has a
> non-ascii character (as you noted), and so it needs something to tell it
> what such a character means.  (That what the encoding is.)
>
> A comment like this in line 1 or 2 will specify an encoding:
>  # -*- coding:  -*-
> but, we'll have to know more about the file "Convert to Creole.py" to guess
> what encoding name should be specified there.
>
> You might try utf-8 or latin-1.
>
>
>
> --
> http://mail.python.org/**mailman/listinfo/python-list<http://mail.python.org/mailman/listinfo/python-list>
>

If you are having trouble figuring out which encoding your file has, the
"file" util is often a quick and dirty solution.

#> echo "åäö" > test.txt
#> file test.txt
test.txt: UTF-8 Unicode text
#> iconv test.txt -f utf-8 -t latin1 > test.l1.txt
#> file test.l1.txt
test.l1.txt: ISO-8859 text

Note: I use latin1 (iso-8859-1) because it can describe the characters 'å',
'ä', 'ö'. Your encoding might be different depending on what system you are
using.

The gist is that if you specify the correct encoding as mentioned above with
the "coding"-comment, your program will probably (ish) run as intended.

-- John-John Tedro
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Turkic I and re

2011-09-15 Thread John-John Tedro
On Thu, Sep 15, 2011 at 1:16 PM, Alan Plum  wrote:

> On 2011-09-15 15:02, MRAB wrote:
>
>> The regex module at 
>> http://pypi.python.org/pypi/**regex<http://pypi.python.org/pypi/regex>currently
>>  uses a
>> compromise, where it matches 'I' with 'i' and also 'I' with 'ı' and 'İ'
>> with 'i'.
>>
>> I was wondering if it would be preferable to have a TURKIC flag instead
>> ("(?T)" or "(?T:...)" in the pattern).
>>
>
> I think the problem many people ignore when coming up with solutions like
> this is that while this behaviour is pretty much unique for Turkish script,
> there is no guarantee that Turkish substrings won't appear in other language
> strings (or vice versa).
>
> For example, foreign names in Turkish are often given as spelled in their
> native (non-Turkish) script variants. Likewise, Turkish names in other
> languages are often given as spelled in Turkish.
>
> The Turkish 'I' is a peculiarity that will probably haunt us programmers
> until hell freezes over. Unless Turkey abandons its traditional orthography
> or people start speaking only a single language at a time (including names),
> there's no easy way to deal with this.
>
> In other words: the only way to make use of your proposed flag is if you
> have a fully language-tagged input (e.g. an XML document making extensive
> use of xml:lang) and only ever apply regular expressions to substrings
> containing one culture at a time.
>
> --
> http://mail.python.org/**mailman/listinfo/python-list<http://mail.python.org/mailman/listinfo/python-list>
>

Python does not appear to support special cases mapping, in effect, it is
not 100% compliant with the unicode standard.

The locale specific 'i' casing in Turkic is mentioned in 5.18 (Case
Mappings<http://www.unicode.org/versions/Unicode6.0.0/ch05.pdf#G21180>)
of the unicode standard.
http://www.unicode.org/versions/Unicode6.0.0/ch05.pdf#G21180

AFAIK, the case methods of python strings seems to be built around the
assumption that len("string") == len("string".upper()), but some of these
casing rules require that the string grow. Like uppercasing of the german
sharp s "ß" which should be translated to the expanded string "SS".
These special cases should be triggered on specific locales, but I have not
been able to verify that the Turkic uppercasing of "i" works on either
python 2.6, 2.7 or 3.1:

  locale.setlocale(locale.LC_ALL, "tr_TR.utf8") # warning, requires turkish
locale on your system.
  ord("i".upper()) == 0x130 # is False for me, but should be True

I wouldn't be surprised if these issues are translated into the 're' module.

The only support appears to be 'L' switch, but it only makes "\w, \W, \b, \B,
\s and \S dependent on the current locale".
Which probably does not yield to the special rules mentioned above, but I
could be wrong. Make sure that your locale is correct and test again.

If you are unsuccessful, I don't see a 'Turkic flag' being introduced into
re module any time soon, given the following from PEP 20
"Special cases aren't special enough to break the rules"

Cheers,
-- John-John Tedro
-- 
http://mail.python.org/mailman/listinfo/python-list


Fwd: Turkic I and re

2011-09-16 Thread John-John Tedro
On Fri, Sep 16, 2011 at 7:25 AM, Steven D'Aprano <
steve+comp.lang.pyt...@pearwood.info> wrote:

> Thomas Rachel wrote:
>
> > Am 15.09.2011 15:16 schrieb Alan Plum:
> >
> >> The Turkish 'I' is a peculiarity that will probably haunt us programmers
> >> until hell freezes over.
>
>
> Meh, I don't think it's much more peculiar that any other diacritic issue.
> If I'm German or English, I probably want ö and O to match during
> case-insensitive comparisons, so that Zöe and ZOE match. If I'm Icelandic,
> I don't. I don't really see why Turkic gets singled out.
>
>
> > That's why it would have been nice if the Unicode guys had defined "both
> > Turkish i-s" at separate codepoints.
> >
> > Then one could have the three pairs
> > I, i ("normal")
> > I (other one), ı
> >
> > and
> >
> > İ, i (the other one).
>
> And then people will say, "How can I match both sorts of dotless uppercase
> I
> but not dotted I when I'm doing comparisons?"
>
>
>
> --
> Steven
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

Yeah, it's more probable that language conventions and functions grow around
characters that look right.

No one except developers care what specific codepoint they have, so soon you
would have a mish-mash of special rules converting between each special
case.

P.S. Sorry Steven, i missed clicking "reply to all".

-- John-John Tedro
-- 
http://mail.python.org/mailman/listinfo/python-list