Re: Regular Expressions: large amount of or's

2005-03-23 Thread Daniel Yoo
: Done. 'startpos' and other bug fixes are in Release 0.7: : http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/ahocorasick-0.7.tar.gz Ok, I stopped working on the Aho-Corasick module for a while, so I've just bumped the version number to 0.8 and posted it up on PyPI. I did add some prelimin

Re: Regular Expressions: large amount of or's

2005-03-14 Thread Daniel Yoo
Scott David Daniels <[EMAIL PROTECTED]> wrote: : I have a (very high speed) modified Aho-Corasick machine that I sell. : The calling model that I found works well is: : def chases(self, sourcestream, ...): : '''A generator taking a generator of source blocks, : yielding (

Re: Regular Expressions: large amount of or's

2005-03-14 Thread Daniel Yoo
Daniel Yoo <[EMAIL PROTECTED]> wrote: : John Machin <[EMAIL PROTECTED]> wrote: : : tree.search("I went to alpha beta the other day to pick up some spam") : : could use a startpos (default=0) argument for efficiently restarting : : the search after finding the first match : Ok, that's easy to fi

Re: Regular Expressions: large amount of or's

2005-03-13 Thread Scott David Daniels
Daniel Yoo wrote: John Machin <[EMAIL PROTECTED]> wrote: : tree.search("I went to alpha beta the other day to pick up some spam") : could use a startpos (default=0) argument for efficiently restarting : the search after finding the first match Ok, that's easy to fix. I'll do that tonight. I have a

Re: Regular Expressions: large amount of or's

2005-03-13 Thread Daniel Yoo
John Machin <[EMAIL PROTECTED]> wrote: : tree.search("I went to alpha beta the other day to pick up some spam") : could use a startpos (default=0) argument for efficiently restarting : the search after finding the first match Ok, that's easy to fix. I'll do that tonight. -- http://mail.python

Re: Regular Expressions: large amount of or's

2005-03-13 Thread John Machin
Daniel Yoo wrote: > > Here you go: > > http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/ > > This provides an 'ahocorasick' Python C extension module for doing > matching on a set of keywords. I'll start writing out the package > announcements tomorrow. > Looks good. However: tree.sea

Re: Regular Expressions: large amount of or's

2005-03-13 Thread Daniel Yoo
: Otherwise, you may want to look at a specialized data structure for : doing mutiple keyword matching; I had an older module that wrapped : around a suffix tree: :http://hkn.eecs.berkeley.edu/~dyoo/python/suffix_trees/ : It looks like other folks, thankfully, have written other : implementat

Re: Regular Expressions: large amount of or's

2005-03-03 Thread Manlio Perillo
Hi. Python allows to subclass builtin classes but the Python Interpreter uses builtin types. As an example keyword arguments are inserted in a dict but I would like to use an user defined SortedDict. There are plans to add such a feature in a future version? Thanks and regards Manlio Perillo -

Re: Regular Expressions: large amount of or's

2005-03-03 Thread Manlio Perillo
Hi. Python allows to subclass builtin classes but the Python Interpreter uses builtin types. As an example keyword arguments are inserted in a dict but I would like to use an user defined SortedDict. There are plans to add such a feature in a future version? Thanks and regards Manlio Perillo -

Re: Regular Expressions: large amount of or's

2005-03-03 Thread Manlio Perillo
On Tue, 1 Mar 2005 15:03:50 -0500, Tim Peters <[EMAIL PROTECTED]> wrote: >[André Søreng] >> Given a string, I want to find all ocurrences of >> certain predefined words in that string. Problem is, the list of >> words that should be detected can be in the order of thousands. >> >> With the re modu

Re: Regular Expressions: large amount of or's

2005-03-02 Thread Gurpreet Sachdeva
Can divide the regex on the bases of alphabets they are starting with or can iterate on the list. Regards, Garry http://garrythegambler.blogspot.com/ On Wed, 02 Mar 2005 12:50:01 +0100, André Søreng <[EMAIL PROTECTED]> wrote: > Ola Natvig wrote: > > André Søreng wrote: > > > >> > >> > >> Yes, b

Re: Regular Expressions: large amount of or's

2005-03-02 Thread André Søreng
Ola Natvig wrote: André Søreng wrote: Yes, but I was looking for a solution which would scale. Searching through the same string 1+++ times does not seem like a suitable solution. André Just for curiosity, what would a regexp do? Perhaps it's a clue in how you could do this in the way reg

Re: Regular Expressions: large amount of or's

2005-03-02 Thread Ola Natvig
André Søreng wrote: Yes, but I was looking for a solution which would scale. Searching through the same string 1+++ times does not seem like a suitable solution. André Just for curiosity, what would a regexp do? Perhaps it's a clue in how you could do this in the way regexp's are executed.

Re: Regular Expressions: large amount of or's

2005-03-02 Thread André Søreng
Daniel Yoo wrote: Kent Johnson <[EMAIL PROTECTED]> wrote: :> Given a string, I want to find all ocurrences of :> certain predefined words in that string. Problem is, the list of :> words that should be detected can be in the order of thousands. :> :> With the re module, this can be solved somethin

Re: Regular Expressions: large amount of or's

2005-03-02 Thread André Søreng
Bill Mill wrote: On Tue, 01 Mar 2005 22:04:15 +0100, André Søreng <[EMAIL PROTECTED]> wrote: Kent Johnson wrote: André Søreng wrote: Hi! Given a string, I want to find all ocurrences of certain predefined words in that string. Problem is, the list of words that should be detected can be in the ord

Re: Regular Expressions: large amount of or's

2005-03-01 Thread Daniel Yoo
Kent Johnson <[EMAIL PROTECTED]> wrote: :> Given a string, I want to find all ocurrences of :> certain predefined words in that string. Problem is, the list of :> words that should be detected can be in the order of thousands. :> :> With the re module, this can be solved something like this: :>

Re: Regular Expressions: large amount of or's

2005-03-01 Thread Nick Craig-Wood
André Søreng <[EMAIL PROTECTED]> wrote: > Given a string, I want to find all ocurrences of > certain predefined words in that string. Problem is, the list of > words that should be detected can be in the order of thousands. > > With the re module, this can be solved something like this: > >

Re: Regular Expressions: large amount of or's

2005-03-01 Thread Anthra Norell
  - Original Message - From: "André Søreng" <[EMAIL PROTECTED]> Newsgroups: comp.lang.python To: Sent: Tuesday, March 01, 2005 8:46 PM Subject: Regular Expressions: large amount of or's > > Hi!> > Given a string, I want to find all ocurrences of> certain prede

Re: Regular Expressions: large amount of or's

2005-03-01 Thread Kent Johnson
André Søreng wrote: Hi! Given a string, I want to find all ocurrences of certain predefined words in that string. Problem is, the list of words that should be detected can be in the order of thousands. With the re module, this can be solved something like this: import re r = re.compile("word1|word2

Re: Regular Expressions: large amount of or's

2005-03-01 Thread Bill Mill
On Tue, 01 Mar 2005 22:04:15 +0100, André Søreng <[EMAIL PROTECTED]> wrote: > Kent Johnson wrote: > > André Søreng wrote: > > > >> > >> Hi! > >> > >> Given a string, I want to find all ocurrences of > >> certain predefined words in that string. Problem is, the list of > >> words that should be dete

Re: Regular Expressions: large amount of or's

2005-03-01 Thread Francis Girard
Le mardi 1 Mars 2005 22:04, André Søreng a écrit : > That is not exactly what I want. It should discover if some of > the predefined words appear as substrings, not only as equal > words. For instance, after matching "word2sgjoisejfisaword1yguyg", word2 > and word1 should be detected. Hi, A lexer

Re: Regular Expressions: large amount of or's

2005-03-01 Thread André Søreng
Kent Johnson wrote: André Søreng wrote: Hi! Given a string, I want to find all ocurrences of certain predefined words in that string. Problem is, the list of words that should be detected can be in the order of thousands. With the re module, this can be solved something like this: import re r = re.

Re: Regular Expressions: large amount of or's

2005-03-01 Thread James Stroud
This does not sound like a job for a single regex. Using a list and listcomp (say your words are in a list called "mywordlist") you can make this quite terse. Of course I have a way of writing algorithms that have very large exp when people tell me the O(N^exp). try this: myregexlist = [re.co

Re: Regular Expressions: large amount of or's

2005-03-01 Thread Kent Johnson
André Søreng wrote: Hi! Given a string, I want to find all ocurrences of certain predefined words in that string. Problem is, the list of words that should be detected can be in the order of thousands. With the re module, this can be solved something like this: import re r = re.compile("word1|word2

Re: Regular Expressions: large amount of or's

2005-03-01 Thread Tim Peters
[André Søreng] > Given a string, I want to find all ocurrences of > certain predefined words in that string. Problem is, the list of > words that should be detected can be in the order of thousands. > > With the re module, this can be solved something like this: > > import re > > r = re.compile("wo