Re: re.search much slower then grep on some regular expressions

2008-07-10 Thread Kris Kennaway
J. Cliff Dyer wrote: On Wed, 2008-07-09 at 12:29 -0700, samwyse wrote: On Jul 8, 11:01 am, Kris Kennaway <[EMAIL PROTECTED]> wrote: samwyse wrote: You might want to look at Plex. http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/ "Another advantage of Plex is that it compiles all of the

Re: re.search much slower then grep on some regular expressions

2008-07-10 Thread Sebastian "lunar" Wiesner
Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]>: > On Mon, 07 Jul 2008 16:44:22 +0200, Sebastian \"lunar\" Wiesner wrote: > >> Mark Wooding <[EMAIL PROTECTED]>: >> >>> Sebastian "lunar" Wiesner <[EMAIL PROTECTED]> wrote: >>> # perl -e '("a" x 10) =~ /^(ab?)*$/;' zsh: segmentation fau

Re: re.search much slower then grep on some regular expressions

2008-07-10 Thread J. Cliff Dyer
On Wed, 2008-07-09 at 12:29 -0700, samwyse wrote: > On Jul 8, 11:01 am, Kris Kennaway <[EMAIL PROTECTED]> wrote: > > samwyse wrote: > > > > You might want to look at Plex. > > >http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/ > > > > > "Another advantage of Plex is that it compiles all of

Re: re.search much slower then grep on some regular expressions

2008-07-10 Thread Kris Kennaway
John Machin wrote: Uh-huh ... try this, then: http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/ You could use this to find the "Str" cases and the prefixes of the "re" cases (which seem to be no more complicated than 'foo.*bar.*zot') and use something slower like Python's re to search the

Re: re.search much slower then grep on some regular expressions

2008-07-09 Thread John Machin
On Jul 9, 10:06 pm, Kris Kennaway <[EMAIL PROTECTED]> wrote: > John Machin wrote: > >> Hmm, unfortunately it's still orders of magnitude slower than grep in my > >> own application that involves matching lots of strings and regexps > >> against large files (I killed it after 400 seconds, compared t

Re: re.search much slower then grep on some regular expressions

2008-07-09 Thread Kris Kennaway
samwyse wrote: On Jul 8, 11:01 am, Kris Kennaway <[EMAIL PROTECTED]> wrote: samwyse wrote: You might want to look at Plex. http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/ "Another advantage of Plex is that it compiles all of the regular expressions into a single DFA. Once that's done

Re: re.search much slower then grep on some regular expressions

2008-07-09 Thread samwyse
On Jul 8, 11:01 am, Kris Kennaway <[EMAIL PROTECTED]> wrote: > samwyse wrote: > > You might want to look at Plex. > >http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/ > > > "Another advantage of Plex is that it compiles all of the regular > > expressions into a single DFA. Once that's done,

Re: re.search much slower then grep on some regular expressions

2008-07-09 Thread Kris Kennaway
Jeroen Ruigrok van der Werven wrote: -On [20080709 14:08], Kris Kennaway ([EMAIL PROTECTED]) wrote: It's compiler/build output. Sounds like the FreeBSD ports build cluster. :) Yes indeed! Kris, have you tried a PGO build of Python with your specific usage? I cannot guarantee it will signif

Re: re.search much slower then grep on some regular expressions

2008-07-09 Thread Jeroen Ruigrok van der Werven
-On [20080709 14:08], Kris Kennaway ([EMAIL PROTECTED]) wrote: >It's compiler/build output. Sounds like the FreeBSD ports build cluster. :) Kris, have you tried a PGO build of Python with your specific usage? I cannot guarantee it will significantly speed things up though. Also, a while ago I di

Re: re.search much slower then grep on some regular expressions

2008-07-09 Thread Kris Kennaway
John Machin wrote: Hmm, unfortunately it's still orders of magnitude slower than grep in my own application that involves matching lots of strings and regexps against large files (I killed it after 400 seconds, compared to 1.5 for grep), and that's leaving aside the much longer compilation time

Re: re.search much slower then grep on some regular expressions

2008-07-09 Thread John Machin
On Jul 9, 2:01 am, Kris Kennaway <[EMAIL PROTECTED]> wrote: > samwyse wrote: > > On Jul 4, 6:43 am, Henning_Thornblad <[EMAIL PROTECTED]> > > wrote: > >> What can be the cause of the large difference between re.search and > >> grep? > > >> While doing a simple grep: > >> grep '[^ "=]*/' input      

Re: re.search much slower then grep on some regular expressions

2008-07-08 Thread Henning Thornblad
On Jul 8, 2:48 am, John Machin <[EMAIL PROTECTED]> wrote: > On Jul 8, 2:51 am, Henning Thornblad <[EMAIL PROTECTED]> > wrote: > > > > > When trying to find an alternative way of solving my problem i found > > that running this script: > > > #!/usr/bin/env python > > > import re > > > row="" > > for

Re: re.search much slower then grep on some regular expressions

2008-07-08 Thread Kris Kennaway
samwyse wrote: On Jul 4, 6:43 am, Henning_Thornblad <[EMAIL PROTECTED]> wrote: What can be the cause of the large difference between re.search and grep? While doing a simple grep: grep '[^ "=]*/' input (input contains 156.000 a in one row) doesn't even take a second. Is this

Re: re.search much slower then grep on some regular expressions

2008-07-08 Thread Kris Kennaway
samwyse wrote: On Jul 4, 6:43 am, Henning_Thornblad <[EMAIL PROTECTED]> wrote: What can be the cause of the large difference between re.search and grep? While doing a simple grep: grep '[^ "=]*/' input (input contains 156.000 a in one row) doesn't even take a second. Is this

Re: re.search much slower then grep on some regular expressions

2008-07-08 Thread samwyse
On Jul 4, 6:43 am, Henning_Thornblad <[EMAIL PROTECTED]> wrote: > What can be the cause of the large difference between re.search and > grep? > While doing a simple grep: > grep '[^ "=]*/' input                  (input contains 156.000 a in > one row) > doesn't even take a second. > > Is this a bu

Re: re.search much slower then grep on some regular expressions

2008-07-07 Thread John Machin
On Jul 8, 2:51 am, Henning Thornblad <[EMAIL PROTECTED]> wrote: > When trying to find an alternative way of solving my problem i found > that running this script: > > #!/usr/bin/env python > > import re > > row="" > for a in range(156000): > row+="a" > print "How many, dude?" > print re.search(

Re: re.search much slower then grep on some regular expressions

2008-07-07 Thread John Machin
On Jul 8, 2:51 am, Henning Thornblad <[EMAIL PROTECTED]> wrote: > When trying to find an alternative way of solving my problem i found > that running this script: > > #!/usr/bin/env python > > import re > > row="" > for a in range(156000): > row+="a" > print "How many, dude?" > print re.search(

Re: re.search much slower then grep on some regular expressions

2008-07-07 Thread Kris Kennaway
Paddy wrote: On Jul 4, 1:36 pm, Peter Otten <[EMAIL PROTECTED]> wrote: Henning_Thornblad wrote: What can be the cause of the large difference between re.search and grep? grep uses a smarter algorithm ;) This script takes about 5 min to run on my computer: #!/usr/bin/env python import re ro

Re: re.search much slower then grep on some regular expressions

2008-07-07 Thread Henning Thornblad
When trying to find an alternative way of solving my problem i found that running this script: #!/usr/bin/env python import re row="" for a in range(156000): row+="a" print "How many, dude?" print re.search('/[^ "=]*',row) (the / has moved) wouldn't take even a second (The re.search part o

Re: re.search much slower then grep on some regular expressions

2008-07-07 Thread Marc 'BlackJack' Rintsch
On Mon, 07 Jul 2008 16:44:22 +0200, Sebastian \"lunar\" Wiesner wrote: > Mark Wooding <[EMAIL PROTECTED]>: > >> Sebastian "lunar" Wiesner <[EMAIL PROTECTED]> wrote: >> >>> # perl -e '("a" x 10) =~ /^(ab?)*$/;' >>> zsh: segmentation fault perl -e '("a" x 10) =~ /^(ab?)*$/;' >> >> (Did y

Re: re.search much slower then grep on some regular expressions

2008-07-07 Thread Sebastian "lunar" Wiesner
Mark Wooding <[EMAIL PROTECTED]>: > Sebastian "lunar" Wiesner <[EMAIL PROTECTED]> wrote: > >> # perl -e '("a" x 10) =~ /^(ab?)*$/;' >> zsh: segmentation fault perl -e '("a" x 10) =~ /^(ab?)*$/;' > > (Did you really run that as root?) How come, that you think so? -- Freedom is always

Re: re.search much slower then grep on some regular expressions

2008-07-06 Thread Mark Wooding
Sebastian "lunar" Wiesner <[EMAIL PROTECTED]> wrote: > # perl -e '("a" x 10) =~ /^(ab?)*$/;' > zsh: segmentation fault perl -e '("a" x 10) =~ /^(ab?)*$/;' (Did you really run that as root?) > It'd be interesting to know, how CL-PPCRE performs here (I don't know this > library). Stack o

Re: re.search much slower then grep on some regular expressions

2008-07-06 Thread Terry Reedy
Sebastian "lunar" Wiesner wrote: I completely agree. I'd just believe, that the combination of some finite state machine for "classic" expressions with some backtracking code is terribly hard to implement. But I'm not an expert in this, probably some libraries out there already do this. In

Re: re.search much slower then grep on some regular expressions

2008-07-06 Thread Sebastian "lunar" Wiesner
Mark Wooding <[EMAIL PROTECTED]>: > Sebastian "lunar" Wiesner <[EMAIL PROTECTED]> wrote: > >> I just wanted to illustrate, that the speed of the given search is >> somehow related to the complexity of the engine. >> >> Btw, other pcre implementation are as slow as Python or "grep -P". I >> tried

Re: re.search much slower then grep on some regular expressions

2008-07-06 Thread Mark Wooding
Sebastian "lunar" Wiesner <[EMAIL PROTECTED]> wrote: > I just wanted to illustrate, that the speed of the given search is somehow > related to the complexity of the engine. > > Btw, other pcre implementation are as slow as Python or "grep -P". I tried > a sample C++-code using pcre++ (a wrapper

Re: re.search much slower then grep on some regular expressions

2008-07-06 Thread [EMAIL PROTECTED]
On Jul 5, 11:13 am, Mark Dickinson <[EMAIL PROTECTED]> wrote: > Apparently, grep and Tcl convert a regex to a finite state machine. ... > But not all PCREs can be converted to a finite state machine ... > Part of the problem is a lack of agreement on what > 'regular expression' means. Strictly sp

Re: re.search much slower then grep on some regular expressions

2008-07-05 Thread Sebastian "lunar" Wiesner
Terry Reedy <[EMAIL PROTECTED]>: > Mark Dickinson wrote: >> On Jul 5, 1:54 pm, Carl Banks <[EMAIL PROTECTED]> wrote: > >> Part of the problem is a lack of agreement on what >> 'regular expression' means. > > Twenty years ago, there was. Calling a extended re-derived grammar > expression like Pe

Re: re.search much slower then grep on some regular expressions

2008-07-05 Thread Terry Reedy
Mark Dickinson wrote: On Jul 5, 1:54 pm, Carl Banks <[EMAIL PROTECTED]> wrote: Part of the problem is a lack of agreement on what 'regular expression' means. Twenty years ago, there was. Calling a extended re-derived grammar expression like Perl's a 'regular-expression' is a bit like cal

Re: re.search much slower then grep on some regular expressions

2008-07-05 Thread Paddy
On Jul 5, 4:13 pm, Mark Dickinson <[EMAIL PROTECTED]> wrote: > It seems like an appropriate moment to point out *this* paper: > > http://swtch.com/~rsc/regexp/regexp1.html > That's the one! Thanks Mark. - Paddy. -- http://mail.python.org/mailman/listinfo/python-list

Re: re.search much slower then grep on some regular expressions

2008-07-05 Thread Mark Dickinson
On Jul 5, 1:54 pm, Carl Banks <[EMAIL PROTECTED]> wrote: > I don't think you've illustrated that at all.  What you've illustrated > is that one implementation of regexp optimizes something that another > doesn't.  It might be due to differences in complexity; it might not. > (Maybe there's somethin

Re: re.search much slower then grep on some regular expressions

2008-07-05 Thread bearophileHUGS
Paddy: > You could argue that if the costly RE features are not used then maybe > simpler, faster algorithms should be automatically swapped in but Many Python implementations contains a TCL interpreter. TCL REs may be better than Python ones, so it can be interesting to benchmark the same RE

Re: re.search much slower then grep on some regular expressions

2008-07-05 Thread Carl Banks
On Jul 5, 6:44 am, "Sebastian \"lunar\" Wiesner" <[EMAIL PROTECTED]> wrote: > Carl Banks <[EMAIL PROTECTED]>: > > > > > On Jul 5, 4:12 am, "Sebastian \"lunar\" Wiesner" > > <[EMAIL PROTECTED]> wrote: > >> Paddy <[EMAIL PROTECTED]>: > > >> > On Jul 4, 1:36 pm, Peter Otten <[EMAIL PROTECTED]> wrote:

Re: re.search much slower then grep on some regular expressions

2008-07-05 Thread Sebastian "lunar" Wiesner
Carl Banks <[EMAIL PROTECTED]>: > On Jul 5, 4:12 am, "Sebastian \"lunar\" Wiesner" > <[EMAIL PROTECTED]> wrote: >> Paddy <[EMAIL PROTECTED]>: >> >> >> >> > On Jul 4, 1:36 pm, Peter Otten <[EMAIL PROTECTED]> wrote: >> >> Henning_Thornblad wrote: >> >> > What can be the cause of the large difference

Re: re.search much slower then grep on some regular expressions

2008-07-05 Thread Carl Banks
On Jul 5, 4:12 am, "Sebastian \"lunar\" Wiesner" <[EMAIL PROTECTED]> wrote: > Paddy <[EMAIL PROTECTED]>: > > > > > On Jul 4, 1:36 pm, Peter Otten <[EMAIL PROTECTED]> wrote: > >> Henning_Thornblad wrote: > >> > What can be the cause of the large difference between re.search and > >> > grep? > > >> g

Re: re.search much slower then grep on some regular expressions

2008-07-05 Thread Sebastian "lunar" Wiesner
Paddy <[EMAIL PROTECTED]>: > On Jul 4, 1:36 pm, Peter Otten <[EMAIL PROTECTED]> wrote: >> Henning_Thornblad wrote: >> > What can be the cause of the large difference between re.search and >> > grep? >> >> grep uses a smarter algorithm ;) >> >> >> >> > This script takes about 5 min to run on my com

Re: re.search much slower then grep on some regular expressions

2008-07-05 Thread Paddy
On Jul 5, 7:01 am, Peter Otten <[EMAIL PROTECTED]> wrote: > Paddy wrote: > > It is not a smarter algorithm that is used in grep. Python RE's have > > more capabilities than grep RE's which need a slower, more complex > > algorithm. > > So you're saying the Python algo is alternatively gifted... > >

Re: re.search much slower then grep on some regular expressions

2008-07-04 Thread Peter Otten
Filipe Fernandes wrote: > but why would you say this particular > regex isn't common enough in real code? As Carl says, it's not just the regex, it's the the combination with a long line that exposes the re library's weakness. Peter -- http://mail.python.org/mailman/listinfo/python-list

Re: re.search much slower then grep on some regular expressions

2008-07-04 Thread Peter Otten
Paddy wrote: > It is not a smarter algorithm that is used in grep. Python RE's have > more capabilities than grep RE's which need a slower, more complex > algorithm. So you're saying the Python algo is alternatively gifted... Peter -- http://mail.python.org/mailman/listinfo/python-list

Re: re.search much slower then grep on some regular expressions

2008-07-04 Thread Peter Otten
John Nagle wrote: > Henning_Thornblad wrote: >> What can be the cause of the large difference between re.search and >> grep? >> >> This script takes about 5 min to run on my computer: >> #!/usr/bin/env python >> import re >> >> row="" >> for a in range(156000): >> row+="a" >> print re.search

Re: re.search much slower then grep on some regular expressions

2008-07-04 Thread Peter Pearson
On Fri, 4 Jul 2008 20:34:03 -0700 (PDT), Carl Banks wrote: > On Jul 4, 4:43 pm, "Filipe Fernandes" <[EMAIL PROTECTED]> wrote: >> On Fri, Jul 4, 2008 at 8:36 AM, Peter Otten <[EMAIL PROTECTED]> wrote: >> > Henning_Thornblad wrote: >> >> >> This script takes about 5 min to run on my computer: >> >> #

Re: re.search much slower then grep on some regular expressions

2008-07-04 Thread John Nagle
Henning_Thornblad wrote: What can be the cause of the large difference between re.search and grep? This script takes about 5 min to run on my computer: #!/usr/bin/env python import re row="" for a in range(156000): row+="a" print re.search('[^ "=]*/',row) While doing a simple grep: grep '

Re: re.search much slower then grep on some regular expressions

2008-07-04 Thread Carl Banks
On Jul 4, 4:43 pm, "Filipe Fernandes" <[EMAIL PROTECTED]> wrote: > On Fri, Jul 4, 2008 at 8:36 AM, Peter Otten <[EMAIL PROTECTED]> wrote: > > Henning_Thornblad wrote: > > >> What can be the cause of the large difference between re.search and > >> grep? > > > grep uses a smarter algorithm ;) > > >>

Re: re.search much slower then grep on some regular expressions

2008-07-04 Thread Filipe Fernandes
On Fri, Jul 4, 2008 at 8:36 AM, Peter Otten <[EMAIL PROTECTED]> wrote: > Henning_Thornblad wrote: > >> What can be the cause of the large difference between re.search and >> grep? > > grep uses a smarter algorithm ;) > >> This script takes about 5 min to run on my computer: >> #!/usr/bin/env python

Re: re.search much slower then grep on some regular expressions

2008-07-04 Thread Paddy
On Jul 4, 1:36 pm, Peter Otten <[EMAIL PROTECTED]> wrote: > Henning_Thornblad wrote: > > What can be the cause of the large difference between re.search and > > grep? > > grep uses a smarter algorithm ;) > > > > > This script takes about 5 min to run on my computer: > > #!/usr/bin/env python > > im

Re: re.search much slower then grep on some regular expressions

2008-07-04 Thread Peter Otten
Henning_Thornblad wrote: > What can be the cause of the large difference between re.search and > grep? grep uses a smarter algorithm ;) > This script takes about 5 min to run on my computer: > #!/usr/bin/env python > import re > > row="" > for a in range(156000): > row+="a" > print re.sear

Re: re.search much slower then grep on some regular expressions

2008-07-04 Thread Bruno Desthuilliers
Bruno Desthuilliers a écrit : Henning_Thornblad a écrit : What can be the cause of the large difference between re.search and grep? This script takes about 5 min to run on my computer: #!/usr/bin/env python import re row="" for a in range(156000): row+="a" print re.search('[^ "=]*/',row)

Re: re.search much slower then grep on some regular expressions

2008-07-04 Thread Bruno Desthuilliers
Henning_Thornblad a écrit : What can be the cause of the large difference between re.search and grep? This script takes about 5 min to run on my computer: #!/usr/bin/env python import re row="" for a in range(156000): row+="a" print re.search('[^ "=]*/',row) While doing a simple grep: gre