Re: Regular Expression question

Fredrik Lundh Thu, 01 Dec 2005 10:16:13 -0800

Michelle McCall wrote:

>I have a script that needs to scan every line of a file for numerous
> strings.  There are groups of strings for each "area" of data we are looking
> for.  Looping through each of these list of strings separately for each line
> has slowed execution to a crawl.  Can I create ONE regular expression from a
> group of strings such that when I perform a search on a line from the file
> with this RE it will search the line for each one of the strings in the RE ?


does

    m = re.search("spam|egg|bacon", line)

do what you want?

if you need all matches, you can use

    for m in re.finditer("spam|egg|bacon", line):
        ...

if the strings are all literal strings (i.e. no subpatterns), a little 
preparation might
speed things up:

    words = ["spam", "spim", "spum", "spamwall", "wallspam"]
    words.sort() # lexical order
    words.reverse() # look for longest match first
    pattern = "|".join(map(re.escape, words))
    pattern = re.compile(pattern)

    for m in pattern.finditer(line):
        ...

</F> 



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression question

Reply via email to