Re: Regex Speed

2007-02-23 Thread John Machin
On Feb 24, 11:51 am, greg <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > the author of this citation states that > > any regex can be expressed as a DFA machine. However ... > > > I appear to have found one example of a regex > > > which breaks this assumption. > > > "ab+c|abd" > > > A

Re: Regex Speed

2007-02-23 Thread John Machin
On Feb 24, 10:15 am, [EMAIL PROTECTED] wrote: > On Feb 21, 10:34 am, [EMAIL PROTECTED] wrote: > > > On Feb 20, 6:14 pm, Pop User <[EMAIL PROTECTED]> wrote: > > >http://swtch.com/~rsc/regexp/regexp1.html > > Going back a bit on a tangent, the author of this citation states that > any regex can be ex

Re: Regex Speed

2007-02-23 Thread greg
[EMAIL PROTECTED] wrote: > the author of this citation states that > any regex can be expressed as a DFA machine. However ... > I appear to have found one example of a regex > which breaks this assumption. > > "ab+c|abd" > > Am I correct? No. Any NFA can be converted to an equivalent DFA. This

Re: Regex Speed

2007-02-23 Thread garrickp
On Feb 21, 10:34 am, [EMAIL PROTECTED] wrote: > On Feb 20, 6:14 pm, Pop User <[EMAIL PROTECTED]> wrote: > >http://swtch.com/~rsc/regexp/regexp1.html Going back a bit on a tangent, the author of this citation states that any regex can be expressed as a DFA machine. However, while investigating thi

Re: Regex Speed

2007-02-22 Thread Szabolcs Nagy
> Well, just as an idea, there is a portable C library for this at > http://laurikari.net/tre/ released under LGPL. If one is willing to > give up PCRE extensions for speed, it might be worth the work to > wrap this library using SWIG. actually there is a python binding in the tre source with an

Re: Regex Speed

2007-02-21 Thread Kirk Sluder
In article <[EMAIL PROTECTED]>, "John Machin" <[EMAIL PROTECTED]> wrote: > Getting back to the "It would be nice ..." bit: yes, it would be nice > to have even more smarts in re, but who's going to do it? It's not a > "rainy Sunday afternoon" job :-) Well, just as an idea, there is a portable C

Re: Regex Speed

2007-02-21 Thread garrickp
On Feb 20, 6:14 pm, Pop User <[EMAIL PROTECTED]> wrote: > Its very hard to beat grep depending on the nature of the regex you are > searching using. The regex engines in python/perl/php/ruby have traded > the speed of grep/awk for the ability to do more complex searches. > > http://swtch.com/~rsc/

Re: Regex Speed

2007-02-21 Thread Pop User
John Machin wrote: > Or a Glushkov NFA simulated by bit parallelism re module ... see > http://citeseer.ist.psu.edu/551772.html > (which Russ Cox (author of the paper you cited) seems not to have > read). > NR-grep looks interesting, I'll read that. Thanks. > Cox uses a "pathological regex" (reg

Re: Regex Speed

2007-02-21 Thread John Machin
On Feb 21, 12:14 pm, Pop User <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > While creating a log parser for fairly large logs, we have run into an > > issue where the time to process was relatively unacceptable (upwards > > of 5 minutes for 1-2 million lines of logs). In contrast, using

Re: Regex Speed

2007-02-21 Thread Pop User
[EMAIL PROTECTED] wrote: > While creating a log parser for fairly large logs, we have run into an > issue where the time to process was relatively unacceptable (upwards > of 5 minutes for 1-2 million lines of logs). In contrast, using the > Linux tool grep would complete the same search in a matter

Re: Regex Speed

2007-02-20 Thread John Machin
On Feb 21, 11:40 am, [EMAIL PROTECTED] wrote: > On Feb 20, 4:15 pm, "John Machin" <[EMAIL PROTECTED]> wrote: > > > What is an "exclusionary set"? It would help enormously if you were to > > tell us what the regex actually is. Feel free to obfuscate any > > proprietary constant strings, of course. >

Re: Regex Speed

2007-02-20 Thread Gabriel Genellina
En Tue, 20 Feb 2007 21:40:40 -0300, <[EMAIL PROTECTED]> escribió: > My apologies. I don't have specifics right now, but it's something > along the line of this: > > error_list = re.compile(r"error|miss|issing|inval|nvalid|math") > > Yes, I know, these are not re expressions, but the requirements f

Re: Regex Speed

2007-02-20 Thread Alejandro Dubrovsky
Steve Holden wrote: > John Machin wrote: > [...] >> >> To help you, we need either (a) basic information or (b) crystal >> balls. > [...] > > How on earth would having glass testicles help us help him? > John, of course, meant spheres of doped single crystal silicon on which we could simulate

Re: Regex Speed

2007-02-20 Thread Steve Holden
John Machin wrote: [...] > > To help you, we need either (a) basic information or (b) crystal > balls. [...] How on earth would having glass testicles help us help him? regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com S

Re: Regex Speed

2007-02-20 Thread garrickp
On Feb 20, 4:15 pm, "John Machin" <[EMAIL PROTECTED]> wrote: > What is an "exclusionary set"? It would help enormously if you were to > tell us what the regex actually is. Feel free to obfuscate any > proprietary constant strings, of course. My apologies. I don't have specifics right now, but it'

Re: Regex Speed

2007-02-20 Thread Alejandro Dubrovsky
[EMAIL PROTECTED] wrote: > While creating a log parser for fairly large logs, we have run into an > issue where the time to process was relatively unacceptable (upwards > of 5 minutes for 1-2 million lines of logs). In contrast, using the > Linux tool grep would complete the same search in a matte

Re: Regex Speed

2007-02-20 Thread John Machin
On Feb 21, 8:29 am, [EMAIL PROTECTED] wrote: > While creating a log parser for fairly large logs, we have run into an > issue where the time to process was relatively unacceptable (upwards > of 5 minutes for 1-2 million lines of logs). In contrast, using the > Linux tool grep would complete the sam

Regex Speed

2007-02-20 Thread garrickp
While creating a log parser for fairly large logs, we have run into an issue where the time to process was relatively unacceptable (upwards of 5 minutes for 1-2 million lines of logs). In contrast, using the Linux tool grep would complete the same search in a matter of seconds. The search we used