date:20070612

[Python-Dev] 2.5 slower than 2.4 for some things?

2007-06-12 Thread Greg Ewing


I've had a report from a user that Plex runs about half
as fast in 2.5 as it did in 2.4. In particular, the
NFA-to-DFA conversion phase, which does a lot of
messing about with dicts representing mappings between
sets of states.

Does anyone in the Ministry for Making Python Blazingly
fast happen to know of some change that might have
pessimised things in this area?

--
Greg
--- Begin Message ---
Hi,

I  have been using Plex now for several years and really like it very much!
Recently I switched from python 2.4 to 2.5 and I noticed that the parser runs
significantly slower with 2.5. I hope you do not mind that I attach an example
script and two profiler logs which show the difference. The difference is almost
a factor of 2.   Do you have an idea why that might happen and is there anything
one could do to improve the performance?

Regards, Christian

-- 

Christian Kristukat  ::
Institut fuer Festkoerperphysik, TU Berlin ==
[EMAIL PROTECTED]  ||
Tel. +49-30-20896371  
from Plex import *
from Plex.Traditional import re as regex

class ParseString:
def __init__(self, parse_str):
self.parse_str = parse_str
self.EOF = 0
   
def read(self, size):
if self.EOF:
return ''
else:
self.EOF = 1
return self.parse_str

def reset(self):
self.EOF = 0

class SymParser:
def __init__(self, tok):
self.pstr = ParseString(tok)

self.count = 0
self.varlist = {}
self.dummy = []
self.nvars = 0

self.varfunc = self.setvar

def setvar(self,scanner,name):
if name in ['caller','e','pi']:
return name
if name not in self.varlist:
self.varlist[name] = ['ns',self.nvars]
self.dummy.append(name)
ret = 'a[%d]'%self.nvars
self.nvars += 1
else:
ret = 'a[%d]'%(self.dummy.index(name)+self.count)
return ret

def parse(self):
letter = regex('[A-Za-z_]')
digit = Range("09")
dot = Str(".")
rnumber = (Rep1(digit)+dot+Rep1(digit))|Rep1(digit)
expnumber = Rep1(digit)+dot+Rep1(digit)+Str('e')+(Any('-+')|Empty)+Rep1(digit)
cnumber = (Rep1(digit)+dot+Rep1(digit)+Str('j'))|(Rep1(digit)+Str('j'))
number = rnumber|cnumber|expnumber
x = Str("x")
name = Rep1(letter)|(Rep1(letter)+Rep1(digit)+Rep(letter))
inst_member = (name|Str(")")|digit)+dot+name
parname = Str(r"'")+name+Str(r"'")
func = name+Str("(")
op = Any("^+-/*(),")
space = Any(" \t\n\r")

lex = Lexicon([
(number, TEXT),
(x, TEXT),
(func, TEXT),
(parname, TEXT),
(inst_member, TEXT),
(name, self.varfunc),
(op, TEXT),
(space, IGNORE),
(AnyChar, IGNORE)
])

parsed = ""
scanner = Scanner(lex, self.pstr, "pparse")
while 1:
tok = scanner.read()
if tok[0] is None:
break
parsed += tok[0]
self.count += 1

return self.varlist

def sym():
for x in range(10):
a = SymParser('amp*exp(-(x-pos)**2/fwhm)')
a.parse()
print a

def prof_sym():
import profile
import pstats
profile.run('sym()','modelprof')
p = pstats.Stats('modelprof')
p.strip_dirs()
p.sort_stats('cumulative')
p.print_stats()


if __name__ == '__main__':
prof_sym()

<__main__.SymParser instance at 0xb7c2d34c>
Sat Jun  9 21:45:53 2007modelprof

 106631 function calls (104491 primitive calls) in 1.700 CPU seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10.0000.0001.7001.700 plex_test2.py:81(sym)
10.0000.0001.7001.700 profile:0(sym())
10.0000.0001.7001.700 :1(?)
   100.0000.0001.7000.170 plex_test2.py:42(parse)
   100.0000.0001.5600.156 Lexicons.py:113(__init__)
   100.1900.0191.2600.126 DFA.py:13(nfa_to_dfa)
 13500.0700.0000.3100.000 DFA.py:100(old_to_new)
   900.0100.0000.3000.003 
Lexicons.py:158(add_token_to_machine)
   530/900.0300.0000.2700.003 Regexps.py:362(build_machine)
  590/1000.0200.0000.2400.002 Regexps.py:315(build_machine)
 26000.0900.0000.2200.000 DFA.py:50(set_epsilon_closure)
 28000.1700.0000.2200.000 Transitions.py:91(items)
 13500.0500.0000.1900.000 DFA.py:140(make_key)
  2900.0200.0000.1800.001 Regexps.py:384(build_machine)
 13400.1000.0000.1500.000 Machines.py:180(add_transitions)
 26000.0700.000

Re: [Python-Dev] Instance variable access and descriptors

2007-06-12 Thread Greg Ewing

Phillip J. Eby wrote:
> ...at the cost of slowing down access to properties and __slots__, by 
> adding an *extra* dictionary lookup there.

Rather than spend time tinkering with the lookup order,
it might be more productive to look into implementing
a cache for attribute lookups. That would help with
method lookups as well, which are probably more
frequent than instance var accesses.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Instance variable access and descriptors

2007-06-12 Thread Brian Harring

On Tue, Jun 12, 2007 at 08:10:26PM +1200, Greg Ewing wrote:
> Phillip J. Eby wrote:
> > ...at the cost of slowing down access to properties and __slots__, by 
> > adding an *extra* dictionary lookup there.
> 
> Rather than spend time tinkering with the lookup order,
> it might be more productive to look into implementing
> a cache for attribute lookups. That would help with
> method lookups as well, which are probably more
> frequent than instance var accesses.

Was wondering the same; specifically, hijacking pep280 celldict 
appraoch for this.

Downside, this would break code that tries to do PyDict_* calls on a 
class tp_dict; haven't dug extensively, but I'm sure there are a few 
out there.

Main thing I like about that approach is that it avoids the staleness 
verification crap, single lookup- it's there or it isn't.  It would 
also be resuable for 280.

If folks don't much like the hit from tracing back to a cell holding 
an actual value, could always implement it such that upon change, the 
change propagates out to instances registered (iow, change a.__dict__, 
it notifies b.__dict__ of the change, etc, till it hits a point where 
the change doesn't need to go further).

~harring

pgphUjh4BMXhf.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 2.5 slower than 2.4 for some things?

2007-06-12 Thread ocean

> I've had a report from a user that Plex runs about half
> as fast in 2.5 as it did in 2.4. In particular, the
> NFA-to-DFA conversion phase, which does a lot of
> messing about with dicts representing mappings between
> sets of states.
>
> Does anyone in the Ministry for Making Python Blazingly
> fast happen to know of some change that might have
> pessimised things in this area?

Hello, I investigated. On my environment, consumed time is

E:\Plex-1.1.5>py24 plex_test2.py
0.71065668

E:\Plex-1.1.5>py25 plex_test2.py
0.92131335

And after I applied this patch to Plex/Machines, (make `Node' new style
class)

62c62
< class Node:
---
> class Node(object):

E:\Plex-1.1.5>py24 plex_test2.py
0.40122888

E:\Plex-1.1.5>py25 plex_test2.py
0.350999832153

So, probably hash, comparation mechanizm of old/new style class has changed.
# improved for new style class, worse for old style class. Maybe optimized
for new style class?

Try this for minimum test.

import timeit

init = """
class Class:
 pass
c1 = Class()
c2 = Class()
"""

t1 = timeit.Timer("""
c1 < c2
""", init)

t2 = timeit.Timer("""
hash(c1)
hash(c2)
""", init)

print t1.timeit(1000)
print t2.timeit(1000)

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Question about dictobject.c:lookdict_string

2007-06-12 Thread Eyal Lotem

On 6/11/07, Carl Friedrich Bolz <[EMAIL PROTECTED]> wrote:
> Eyal Lotem wrote:
> > My question is specifically regarding the transition back from
> > lookdict_string (the initial value) to the general lookdict.
> >
> > Currently, when a string-only dict is trying to look up any
> > non-string, it reverts back to a general lookdict.
> >
> > Wouldn't it be better (especially in the more important case of a
> > string-key-only dict), to revert to the generic lookdict when a
> > non-string is inserted to the dict, rather than when one is being
> > searched?
> [...]
> > This does not seem like a significant issue, but as I know a lot of
> > effort went into optimizing dicts, I was wondering if I am missing
> > something here.
>
> Yes, you are: when doing a lookup with a non-string-key, that key could
> be an instance of a class that has __hash__ and __eq__ implementations
> that make the key compare equal to some string that is in the
> dictionary. So you need to change to lookdict, otherwise that lookup
> might fail.
Ah, thanks for clarification.

But doesn't it make sense to only revert that single lookup, and not
modify the function ptr until the dict contains a non-string?

Eyal
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] [RFC] urlparse - parse query facility

2007-06-12 Thread Senthil Kumaran

Hi all,
This mail is a request for comments on changes to urlparse module. We understand
that urlparse returns the 'complete query' value as the query
component and does not
provide the facilities to separate the query components. User will have to use
the cgi module (cgi.parse_qs) to get the query parsed.
There has been a discussion in the past, on having a method of parse query
string available from urlparse module itself. [1]

To implement the query parse feature in urlparse module, we can:
a) import cgi and call cgi module's query_ps.
This approach will have problems as it
i) imports cgi for urlparse module.
ii) cgi module in turn imports urllib and urlparse.

b) Implement a stand alone query parsing facility in urlparse *AS IN*
cgi module.

Below method implements the urlparse_qs(url, keep_blank_values,strict_parsing)
that will help in parsing the query component of the url. It behaves same as the
cgi.parse_qs.

Please let me know your comments on the below code.

--

def unquote(s):
"""unquote('abc%20def') -> 'abc def'."""
res = s.split('%')
for i in xrange(1, len(res)):
item = res[i]
try:
res[i] = _hextochr[item[:2]] + item[2:]
except KeyError:
res[i] = '%' + item
except UnicodeDecodeError:
res[i] = unichr(int(item[:2], 16)) + item[2:]
return "".join(res)

def urlparse_qs(url, keep_blank_values=0, strict_parsing=0):
"""Parse a URL query string and return the components as a dictionary.

Based on the cgi.parse_qs method.This is a utility function provided
with urlparse so that users need not use cgi module for
parsing the url query string.

Arguments:

url: URL with query string to be parsed

keep_blank_values: flag indicating whether blank values in
URL encoded queries should be treated as blank strings.
A true value indicates that blanks should be retained as
blank strings.  The default false value indicates that
blank values are to be ignored and treated as if they were
not included.

strict_parsing: flag indicating what to do with parsing errors.
If false (the default), errors are silently ignored.
If true, errors raise a ValueError exception.
"""

scheme, netloc, url, params, querystring, fragment = urlparse(url)

pairs = [s2 for s1 in querystring.split('&') for s2 in s1.split(';')]
query = []
for name_value in pairs:
if not name_value and not strict_parsing:
continue
nv = name_value.split('=', 1)
if len(nv) != 2:
if strict_parsing:
raise ValueError, "bad query field: %r" % (name_value,)
# Handle case of a control-name with no equal sign
if keep_blank_values:
nv.append('')
else:
continue
if len(nv[1]) or keep_blank_values:
name = unquote(nv[0].replace('+', ' '))
value = unquote(nv[1].replace('+', ' '))
query.append((name, value))

dict = {}
for name, value in query:
if name in dict:
dict[name].append(value)
else:
dict[name] = [value]
return dict

--

Testing:

$ python
Python 2.6a0 (trunk, Jun 10 2007, 12:04:03)
[GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urlparse
>>> dir(urlparse)
['BaseResult', 'MAX_CACHE_SIZE', 'ParseResult', 'SplitResult', '__all__',
'__builtins__', '__doc__', '__file__', '__name__', '_parse_cache',
'_splitnetloc', '_splitparams', 'clear_cache', 'non_hierarchical',
'scheme_chars', 'test', 'test_input', 'unquote', 'urldefrag', 'urljoin',
'urlparse', 'urlparse_qs', 'urlsplit', 'urlunparse', 'urlunsplit',
'uses_fragment', 'uses_netloc', 'uses_params', 'uses_query', 'uses_relative']
>>> URL =
>>> 'http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=utf-8&q=south+africa+travel+cape+town'
>>> print urlparse.urlparse_qs(URL)
{'q': ['south africa travel cape town'], 'oe': ['utf-8'], 'ie': ['UTF-8'],
'hl': ['en']}
>>> print urlparse.urlparse_qs(URL,keep_blank_values=1)
{'q': ['south africa travel cape town'], 'ie': ['UTF-8'], 'oe': ['utf-8'],
'lr': [''], 'hl': ['en']}
>>>


Thanks,
Senthil

[1] http://mail.python.org/pipermail/tutor/2002-August/016823.html



-- 
O.R.Senthil Kumaran
http://phoe6.livejournal.com
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Requesting commit access to python sandbox. Cleanup urllib2 - Summer of Code 2007 Project

2007-06-12 Thread Senthil Kumaran


Hi,
I am a student participant of Google Summer of Code 2007 and I am
working on the cleanup task of urllib2, with Skip as my mentor.
I would like to request for a commit access to the Python Sandbox for
implementing the changes as part of the project. I have attached by
SSH Public keys.
preferred name : senthil.kumaran

I am following up and adding comments to the urllib related bugs at
sf.net page. I would also like to request addition of my sourceforge
id : orsenthil to the python project, so I can close the defects
raised against urllib modules.

Summer of Code Project:
http://code.google.com/soc/psf/appinfo.html?csaid=E73A6612F80229B6

The project actually commenced on May 28th itself. But, there was a
delay from my side to get started. Ivan Sutherland's  essay on
Technology and Courage [1] did some good thing to me. :-)

Thanks,
Senthil

[1] 
http://research.sun.com/techrep/Perspectives/smli_ps-1.pdf#search=%22sutherland%20courage%22

--
O.R.Senthil Kumaran
http://phoe6.livejournal.com


id_rsa.pub
Description: Binary data
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 2.5 slower than 2.4 for some things?

2007-06-12 Thread Greg Ewing

ocean wrote:

> So, probably hash, comparation mechanizm of old/new style class has changed.
> # improved for new style class, worse for old style class. Maybe optimized
> for new style class?

Thanks -- it looks like there's a simple solution that
will make Plex even faster! I'll pass this on to the
OP.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 2.5 slower than 2.4 for some things?

2007-06-12 Thread Christian K

ocean wrote:
>> I've had a report from a user that Plex runs about half
>> as fast in 2.5 as it did in 2.4. In particular, the
>> NFA-to-DFA conversion phase, which does a lot of
>> messing about with dicts representing mappings between
>> sets of states.

That was me.

>> Does anyone in the Ministry for Making Python Blazingly
>> fast happen to know of some change that might have
>> pessimised things in this area?
> 
> Hello, I investigated. On my environment, consumed time is
> 
> E:\Plex-1.1.5>py24 plex_test2.py
> 0.71065668
> 
> E:\Plex-1.1.5>py25 plex_test2.py
> 0.92131335
> 
> And after I applied this patch to Plex/Machines, (make `Node' new style
> class)
> 
> 62c62
> < class Node:
> ---
>> class Node(object):
> 
> E:\Plex-1.1.5>py24 plex_test2.py
> 0.40122888
> 
> E:\Plex-1.1.5>py25 plex_test2.py
> 0.350999832153
> 

Nice!.

Meanwhile I tried to replace the parsing I did with Plex by re.Scanner. And
again there is a remarkable speed difference. Again python2.5 is slower:

try:
from re import Scanner
except:
from sre import Scanner

pars = {}
order = []
count = 0

def par(scanner,name):
global count, order, pars

if name in ['caller','e','pi']:
return name
if name not in pars.keys():
pars[name] = ('ns', count)
order.append(name)
ret = 'a[%d]'%count
count += 1
else:
ret = 'a[%d]'%(order.index(name))
return ret

scanner = Scanner([
(r"x", lambda y,x: x),
(r"[a-zA-Z]+\.", lambda y,x: x),
(r"[a-z]+\(", lambda y,x: x),
(r"[a-zA-Z_]\w*", par),
(r"\d+\.\d*", lambda y,x: x),
(r"\d+", lambda y,x: x),
(r"\+|-|\*|/", lambda y,x: x),
(r"\s+", None),
(r"\)+", lambda y,x: x),
(r"\(+", lambda y,x: x),
(r",", lambda y,x: x),
])

import profile
import pstats

def run():
arg = '+amp*exp(-(x-pos)/fwhm)'
for i in range(100):
scanner.scan(arg)

profile.run('run()','profscanner')
p = pstats.Stats('profscanner')
p.strip_dirs()
p.sort_stats('cumulative')
p.print_stats()


Christian

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] minimal configuration for python on a DSP (C64xx family of TI)

2007-06-12 Thread Roland Geibel


Dear all.

We want to make python run on DSP processors (C64xx
family of TI).

Which would be a minimal configuration (of modules,
C-files, ... ) to make
it start running (without all of the things useful to
add, once it runs).


Any hints welcome


Roland Geibel

[EMAIL PROTECTED]





  Heute schon einen Blick in die Zukunft von E-Mails wagen? Versuchen Sie´s 
mit dem neuen Yahoo! Mail. www.yahoo.de/mail
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Representation of 'nan'

2007-06-12 Thread Pete Shinners


The repr() for a float of 'inf' or 'nan' is generated as a string (not a
string literal). Perhaps this is only important in how one defines repr().
I've filed a bug, but am not sure if there is a clear solution.

https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1732212&group_id=5470

# Repr with a tuple of floats

repr((1.0, 2.0, 3.0))

'(1.0, 2.0, 3.0)'

eval(_)

(1.0, 2.0, 3.0)

# Repr with a tuple of floats, plus nan

repr((1.0, float('nan'), 3.0))

'(1.0, nan, 3.0)'

eval(_)

NameError: name 'nan' is not defined

There are a few alternatives I can think are fairly clean. I think I'd
prefer any of these over the current 'nan' implementation. I don't think it
is worth adding a nan literal into the language. But something could be
changed so that repr of nan meant something.

Best option in my opinion would be adding attributes to float, so that
float.nan, float.inf, and float.ninf are accessable. This could also help
with the odd situations of checking for these out of range values. With that
in place, repr could return 'float.nan' instead of 'nan'. This would make
the repr string evaluatable again. (In contexts where __builtins__ has not
been molested)

Another option could be for repr to return 'float("nan")' for these, which
would also evaluate correctly. But this doesn't seem a clean use for repr.

Is this worth even changing? It's just an irregularity that has come up and
surprised a few of us developers.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] TLSAbruptCloseError

2007-06-12 Thread Martin v. Löwis

> Any thoughts?

My main thought: this posting is off-topic for python-dev.
This list is for the development of Python itself; use
comp.lang.python for discussing development *with* Python.
However, this may still be the wrong place - perhaps
you better ask in a Java forum?

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] 2.5 slower than 2.4 for some things?

Re: [Python-Dev] Instance variable access and descriptors

Re: [Python-Dev] Instance variable access and descriptors

Re: [Python-Dev] 2.5 slower than 2.4 for some things?

Re: [Python-Dev] Question about dictobject.c:lookdict_string

[Python-Dev] [RFC] urlparse - parse query facility

[Python-Dev] Requesting commit access to python sandbox. Cleanup urllib2 - Summer of Code 2007 Project

Re: [Python-Dev] 2.5 slower than 2.4 for some things?

Re: [Python-Dev] 2.5 slower than 2.4 for some things?

[Python-Dev] minimal configuration for python on a DSP (C64xx family of TI)

[Python-Dev] Representation of 'nan'

Re: [Python-Dev] TLSAbruptCloseError

12 matches

Site Navigation

Mail list logo

Footer information