Francis Girard wrote:
Le mardi 1 Mars 2005 16:52, Marc Huffnagle a écrit :

[line for line in document if (line.find('word') != -1 \
       and line.find('wordtwo') != -1)]


Hi,

Using re might be faster than scanning the same line twice :

My understanding of the second question was that he wanted to find lines which contained both words but, looking at it again, it could go either way. If he wants to find lines that contain both of the words, in any order, then I don't think that it can be done without scanning the line twice (regex or not).


To the OP: What kind of data are you testing? Could you try both of these solutions on your sample data and let us know which runs faster?


=== begin snap ## rewords.py

import re
import sys

def iWordsMatch(lines, word, word2):
reWordOneTwo = re.compile(r".*(%s|%s).*" % (word,word2))
return (line for line in lines if reWordOneTwo.match(line))
for line in iWordsMatch(open("rewords.py"), "re", "return"):
sys.stdout.write(line)
=== end snap


Furthermore, using list comprehension generator (2.4 only I think) and file iterator, you can scan files as big as you want with very little memory usage.

Regards,

Francis Girard

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to