[Python-Dev] 2.5 slower than 2.4 for some things?
I've had a report from a user that Plex runs about half
as fast in 2.5 as it did in 2.4. In particular, the
NFA-to-DFA conversion phase, which does a lot of
messing about with dicts representing mappings between
sets of states.
Does anyone in the Ministry for Making Python Blazingly
fast happen to know of some change that might have
pessimised things in this area?
--
Greg
--- Begin Message ---
Hi,
I have been using Plex now for several years and really like it very much!
Recently I switched from python 2.4 to 2.5 and I noticed that the parser runs
significantly slower with 2.5. I hope you do not mind that I attach an example
script and two profiler logs which show the difference. The difference is almost
a factor of 2. Do you have an idea why that might happen and is there anything
one could do to improve the performance?
Regards, Christian
--
Christian Kristukat ::
Institut fuer Festkoerperphysik, TU Berlin ==
[EMAIL PROTECTED] ||
Tel. +49-30-20896371
from Plex import *
from Plex.Traditional import re as regex
class ParseString:
def __init__(self, parse_str):
self.parse_str = parse_str
self.EOF = 0
def read(self, size):
if self.EOF:
return ''
else:
self.EOF = 1
return self.parse_str
def reset(self):
self.EOF = 0
class SymParser:
def __init__(self, tok):
self.pstr = ParseString(tok)
self.count = 0
self.varlist = {}
self.dummy = []
self.nvars = 0
self.varfunc = self.setvar
def setvar(self,scanner,name):
if name in ['caller','e','pi']:
return name
if name not in self.varlist:
self.varlist[name] = ['ns',self.nvars]
self.dummy.append(name)
ret = 'a[%d]'%self.nvars
self.nvars += 1
else:
ret = 'a[%d]'%(self.dummy.index(name)+self.count)
return ret
def parse(self):
letter = regex('[A-Za-z_]')
digit = Range("09")
dot = Str(".")
rnumber = (Rep1(digit)+dot+Rep1(digit))|Rep1(digit)
expnumber = Rep1(digit)+dot+Rep1(digit)+Str('e')+(Any('-+')|Empty)+Rep1(digit)
cnumber = (Rep1(digit)+dot+Rep1(digit)+Str('j'))|(Rep1(digit)+Str('j'))
number = rnumber|cnumber|expnumber
x = Str("x")
name = Rep1(letter)|(Rep1(letter)+Rep1(digit)+Rep(letter))
inst_member = (name|Str(")")|digit)+dot+name
parname = Str(r"'")+name+Str(r"'")
func = name+Str("(")
op = Any("^+-/*(),")
space = Any(" \t\n\r")
lex = Lexicon([
(number, TEXT),
(x, TEXT),
(func, TEXT),
(parname, TEXT),
(inst_member, TEXT),
(name, self.varfunc),
(op, TEXT),
(space, IGNORE),
(AnyChar, IGNORE)
])
parsed = ""
scanner = Scanner(lex, self.pstr, "pparse")
while 1:
tok = scanner.read()
if tok[0] is None:
break
parsed += tok[0]
self.count += 1
return self.varlist
def sym():
for x in range(10):
a = SymParser('amp*exp(-(x-pos)**2/fwhm)')
a.parse()
print a
def prof_sym():
import profile
import pstats
profile.run('sym()','modelprof')
p = pstats.Stats('modelprof')
p.strip_dirs()
p.sort_stats('cumulative')
p.print_stats()
if __name__ == '__main__':
prof_sym()
<__main__.SymParser instance at 0xb7c2d34c>
Sat Jun 9 21:45:53 2007modelprof
106631 function calls (104491 primitive calls) in 1.700 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
10.0000.0001.7001.700 plex_test2.py:81(sym)
10.0000.0001.7001.700 profile:0(sym())
10.0000.0001.7001.700 :1(?)
100.0000.0001.7000.170 plex_test2.py:42(parse)
100.0000.0001.5600.156 Lexicons.py:113(__init__)
100.1900.0191.2600.126 DFA.py:13(nfa_to_dfa)
13500.0700.0000.3100.000 DFA.py:100(old_to_new)
900.0100.0000.3000.003
Lexicons.py:158(add_token_to_machine)
530/900.0300.0000.2700.003 Regexps.py:362(build_machine)
590/1000.0200.0000.2400.002 Regexps.py:315(build_machine)
26000.0900.0000.2200.000 DFA.py:50(set_epsilon_closure)
28000.1700.0000.2200.000 Transitions.py:91(items)
13500.0500.0000.1900.000 DFA.py:140(make_key)
2900.0200.0000.1800.001 Regexps.py:384(build_machine)
13400.1000.0000.1500.000 Machines.py:180(add_transitions)
26000.0700.000
Re: [Python-Dev] Instance variable access and descriptors
Phillip J. Eby wrote: > ...at the cost of slowing down access to properties and __slots__, by > adding an *extra* dictionary lookup there. Rather than spend time tinkering with the lookup order, it might be more productive to look into implementing a cache for attribute lookups. That would help with method lookups as well, which are probably more frequent than instance var accesses. -- Greg ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Instance variable access and descriptors
On Tue, Jun 12, 2007 at 08:10:26PM +1200, Greg Ewing wrote: > Phillip J. Eby wrote: > > ...at the cost of slowing down access to properties and __slots__, by > > adding an *extra* dictionary lookup there. > > Rather than spend time tinkering with the lookup order, > it might be more productive to look into implementing > a cache for attribute lookups. That would help with > method lookups as well, which are probably more > frequent than instance var accesses. Was wondering the same; specifically, hijacking pep280 celldict appraoch for this. Downside, this would break code that tries to do PyDict_* calls on a class tp_dict; haven't dug extensively, but I'm sure there are a few out there. Main thing I like about that approach is that it avoids the staleness verification crap, single lookup- it's there or it isn't. It would also be resuable for 280. If folks don't much like the hit from tracing back to a cell holding an actual value, could always implement it such that upon change, the change propagates out to instances registered (iow, change a.__dict__, it notifies b.__dict__ of the change, etc, till it hits a point where the change doesn't need to go further). ~harring pgphUjh4BMXhf.pgp Description: PGP signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.5 slower than 2.4 for some things?
> I've had a report from a user that Plex runs about half
> as fast in 2.5 as it did in 2.4. In particular, the
> NFA-to-DFA conversion phase, which does a lot of
> messing about with dicts representing mappings between
> sets of states.
>
> Does anyone in the Ministry for Making Python Blazingly
> fast happen to know of some change that might have
> pessimised things in this area?
Hello, I investigated. On my environment, consumed time is
E:\Plex-1.1.5>py24 plex_test2.py
0.71065668
E:\Plex-1.1.5>py25 plex_test2.py
0.92131335
And after I applied this patch to Plex/Machines, (make `Node' new style
class)
62c62
< class Node:
---
> class Node(object):
E:\Plex-1.1.5>py24 plex_test2.py
0.40122888
E:\Plex-1.1.5>py25 plex_test2.py
0.350999832153
So, probably hash, comparation mechanizm of old/new style class has changed.
# improved for new style class, worse for old style class. Maybe optimized
for new style class?
Try this for minimum test.
import timeit
init = """
class Class:
pass
c1 = Class()
c2 = Class()
"""
t1 = timeit.Timer("""
c1 < c2
""", init)
t2 = timeit.Timer("""
hash(c1)
hash(c2)
""", init)
print t1.timeit(1000)
print t2.timeit(1000)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Question about dictobject.c:lookdict_string
On 6/11/07, Carl Friedrich Bolz <[EMAIL PROTECTED]> wrote: > Eyal Lotem wrote: > > My question is specifically regarding the transition back from > > lookdict_string (the initial value) to the general lookdict. > > > > Currently, when a string-only dict is trying to look up any > > non-string, it reverts back to a general lookdict. > > > > Wouldn't it be better (especially in the more important case of a > > string-key-only dict), to revert to the generic lookdict when a > > non-string is inserted to the dict, rather than when one is being > > searched? > [...] > > This does not seem like a significant issue, but as I know a lot of > > effort went into optimizing dicts, I was wondering if I am missing > > something here. > > Yes, you are: when doing a lookup with a non-string-key, that key could > be an instance of a class that has __hash__ and __eq__ implementations > that make the key compare equal to some string that is in the > dictionary. So you need to change to lookdict, otherwise that lookup > might fail. Ah, thanks for clarification. But doesn't it make sense to only revert that single lookup, and not modify the function ptr until the dict contains a non-string? Eyal ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] [RFC] urlparse - parse query facility
Hi all,
This mail is a request for comments on changes to urlparse module. We understand
that urlparse returns the 'complete query' value as the query
component and does not
provide the facilities to separate the query components. User will have to use
the cgi module (cgi.parse_qs) to get the query parsed.
There has been a discussion in the past, on having a method of parse query
string available from urlparse module itself. [1]
To implement the query parse feature in urlparse module, we can:
a) import cgi and call cgi module's query_ps.
This approach will have problems as it
i) imports cgi for urlparse module.
ii) cgi module in turn imports urllib and urlparse.
b) Implement a stand alone query parsing facility in urlparse *AS IN*
cgi module.
Below method implements the urlparse_qs(url, keep_blank_values,strict_parsing)
that will help in parsing the query component of the url. It behaves same as the
cgi.parse_qs.
Please let me know your comments on the below code.
--
def unquote(s):
"""unquote('abc%20def') -> 'abc def'."""
res = s.split('%')
for i in xrange(1, len(res)):
item = res[i]
try:
res[i] = _hextochr[item[:2]] + item[2:]
except KeyError:
res[i] = '%' + item
except UnicodeDecodeError:
res[i] = unichr(int(item[:2], 16)) + item[2:]
return "".join(res)
def urlparse_qs(url, keep_blank_values=0, strict_parsing=0):
"""Parse a URL query string and return the components as a dictionary.
Based on the cgi.parse_qs method.This is a utility function provided
with urlparse so that users need not use cgi module for
parsing the url query string.
Arguments:
url: URL with query string to be parsed
keep_blank_values: flag indicating whether blank values in
URL encoded queries should be treated as blank strings.
A true value indicates that blanks should be retained as
blank strings. The default false value indicates that
blank values are to be ignored and treated as if they were
not included.
strict_parsing: flag indicating what to do with parsing errors.
If false (the default), errors are silently ignored.
If true, errors raise a ValueError exception.
"""
scheme, netloc, url, params, querystring, fragment = urlparse(url)
pairs = [s2 for s1 in querystring.split('&') for s2 in s1.split(';')]
query = []
for name_value in pairs:
if not name_value and not strict_parsing:
continue
nv = name_value.split('=', 1)
if len(nv) != 2:
if strict_parsing:
raise ValueError, "bad query field: %r" % (name_value,)
# Handle case of a control-name with no equal sign
if keep_blank_values:
nv.append('')
else:
continue
if len(nv[1]) or keep_blank_values:
name = unquote(nv[0].replace('+', ' '))
value = unquote(nv[1].replace('+', ' '))
query.append((name, value))
dict = {}
for name, value in query:
if name in dict:
dict[name].append(value)
else:
dict[name] = [value]
return dict
--
Testing:
$ python
Python 2.6a0 (trunk, Jun 10 2007, 12:04:03)
[GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urlparse
>>> dir(urlparse)
['BaseResult', 'MAX_CACHE_SIZE', 'ParseResult', 'SplitResult', '__all__',
'__builtins__', '__doc__', '__file__', '__name__', '_parse_cache',
'_splitnetloc', '_splitparams', 'clear_cache', 'non_hierarchical',
'scheme_chars', 'test', 'test_input', 'unquote', 'urldefrag', 'urljoin',
'urlparse', 'urlparse_qs', 'urlsplit', 'urlunparse', 'urlunsplit',
'uses_fragment', 'uses_netloc', 'uses_params', 'uses_query', 'uses_relative']
>>> URL =
>>> 'http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=utf-8&q=south+africa+travel+cape+town'
>>> print urlparse.urlparse_qs(URL)
{'q': ['south africa travel cape town'], 'oe': ['utf-8'], 'ie': ['UTF-8'],
'hl': ['en']}
>>> print urlparse.urlparse_qs(URL,keep_blank_values=1)
{'q': ['south africa travel cape town'], 'ie': ['UTF-8'], 'oe': ['utf-8'],
'lr': [''], 'hl': ['en']}
>>>
Thanks,
Senthil
[1] http://mail.python.org/pipermail/tutor/2002-August/016823.html
--
O.R.Senthil Kumaran
http://phoe6.livejournal.com
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Requesting commit access to python sandbox. Cleanup urllib2 - Summer of Code 2007 Project
Hi, I am a student participant of Google Summer of Code 2007 and I am working on the cleanup task of urllib2, with Skip as my mentor. I would like to request for a commit access to the Python Sandbox for implementing the changes as part of the project. I have attached by SSH Public keys. preferred name : senthil.kumaran I am following up and adding comments to the urllib related bugs at sf.net page. I would also like to request addition of my sourceforge id : orsenthil to the python project, so I can close the defects raised against urllib modules. Summer of Code Project: http://code.google.com/soc/psf/appinfo.html?csaid=E73A6612F80229B6 The project actually commenced on May 28th itself. But, there was a delay from my side to get started. Ivan Sutherland's essay on Technology and Courage [1] did some good thing to me. :-) Thanks, Senthil [1] http://research.sun.com/techrep/Perspectives/smli_ps-1.pdf#search=%22sutherland%20courage%22 -- O.R.Senthil Kumaran http://phoe6.livejournal.com id_rsa.pub Description: Binary data ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.5 slower than 2.4 for some things?
ocean wrote: > So, probably hash, comparation mechanizm of old/new style class has changed. > # improved for new style class, worse for old style class. Maybe optimized > for new style class? Thanks -- it looks like there's a simple solution that will make Plex even faster! I'll pass this on to the OP. -- Greg ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.5 slower than 2.4 for some things?
ocean wrote:
>> I've had a report from a user that Plex runs about half
>> as fast in 2.5 as it did in 2.4. In particular, the
>> NFA-to-DFA conversion phase, which does a lot of
>> messing about with dicts representing mappings between
>> sets of states.
That was me.
>> Does anyone in the Ministry for Making Python Blazingly
>> fast happen to know of some change that might have
>> pessimised things in this area?
>
> Hello, I investigated. On my environment, consumed time is
>
> E:\Plex-1.1.5>py24 plex_test2.py
> 0.71065668
>
> E:\Plex-1.1.5>py25 plex_test2.py
> 0.92131335
>
> And after I applied this patch to Plex/Machines, (make `Node' new style
> class)
>
> 62c62
> < class Node:
> ---
>> class Node(object):
>
> E:\Plex-1.1.5>py24 plex_test2.py
> 0.40122888
>
> E:\Plex-1.1.5>py25 plex_test2.py
> 0.350999832153
>
Nice!.
Meanwhile I tried to replace the parsing I did with Plex by re.Scanner. And
again there is a remarkable speed difference. Again python2.5 is slower:
try:
from re import Scanner
except:
from sre import Scanner
pars = {}
order = []
count = 0
def par(scanner,name):
global count, order, pars
if name in ['caller','e','pi']:
return name
if name not in pars.keys():
pars[name] = ('ns', count)
order.append(name)
ret = 'a[%d]'%count
count += 1
else:
ret = 'a[%d]'%(order.index(name))
return ret
scanner = Scanner([
(r"x", lambda y,x: x),
(r"[a-zA-Z]+\.", lambda y,x: x),
(r"[a-z]+\(", lambda y,x: x),
(r"[a-zA-Z_]\w*", par),
(r"\d+\.\d*", lambda y,x: x),
(r"\d+", lambda y,x: x),
(r"\+|-|\*|/", lambda y,x: x),
(r"\s+", None),
(r"\)+", lambda y,x: x),
(r"\(+", lambda y,x: x),
(r",", lambda y,x: x),
])
import profile
import pstats
def run():
arg = '+amp*exp(-(x-pos)/fwhm)'
for i in range(100):
scanner.scan(arg)
profile.run('run()','profscanner')
p = pstats.Stats('profscanner')
p.strip_dirs()
p.sort_stats('cumulative')
p.print_stats()
Christian
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] minimal configuration for python on a DSP (C64xx family of TI)
Dear all. We want to make python run on DSP processors (C64xx family of TI). Which would be a minimal configuration (of modules, C-files, ... ) to make it start running (without all of the things useful to add, once it runs). Any hints welcome Roland Geibel [EMAIL PROTECTED] Heute schon einen Blick in die Zukunft von E-Mails wagen? Versuchen Sie´s mit dem neuen Yahoo! Mail. www.yahoo.de/mail ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Representation of 'nan'
The repr() for a float of 'inf' or 'nan' is generated as a string (not a
string literal). Perhaps this is only important in how one defines repr().
I've filed a bug, but am not sure if there is a clear solution.
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1732212&group_id=5470
# Repr with a tuple of floats
repr((1.0, 2.0, 3.0))
'(1.0, 2.0, 3.0)'
eval(_)
(1.0, 2.0, 3.0)
# Repr with a tuple of floats, plus nan
repr((1.0, float('nan'), 3.0))
'(1.0, nan, 3.0)'
eval(_)
NameError: name 'nan' is not defined
There are a few alternatives I can think are fairly clean. I think I'd
prefer any of these over the current 'nan' implementation. I don't think it
is worth adding a nan literal into the language. But something could be
changed so that repr of nan meant something.
Best option in my opinion would be adding attributes to float, so that
float.nan, float.inf, and float.ninf are accessable. This could also help
with the odd situations of checking for these out of range values. With that
in place, repr could return 'float.nan' instead of 'nan'. This would make
the repr string evaluatable again. (In contexts where __builtins__ has not
been molested)
Another option could be for repr to return 'float("nan")' for these, which
would also evaluate correctly. But this doesn't seem a clean use for repr.
Is this worth even changing? It's just an irregularity that has come up and
surprised a few of us developers.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] TLSAbruptCloseError
> Any thoughts? My main thought: this posting is off-topic for python-dev. This list is for the development of Python itself; use comp.lang.python for discussing development *with* Python. However, this may still be the wrong place - perhaps you better ask in a Java forum? Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
