On Feb 1, 3:37 am, Peter Otten <__pete...@web.de> wrote: > Hussein B wrote: > > Hey, > > I have a log file that doesn't contain the word "Haskell" at all, I'm > > just trying to do a little performance comparison: > > ++++++++++++++ > > from datetime import time, timedelta, datetime > > start = datetime.now() > > print start > > lines = [line for line in file('/media/sda4/Servers/Apache/ > > Tomcat-6.0.14/logs/catalina.out') if line.find('Haskell')] > > print 'Number of lines contains "Haskell" = ' + str(len(lines)) > > end = datetime.now() > > print end > > ++++++++++++++ > > Well, the script is returning the whole file's lines number !! > > What is wrong in my logic? > > Thanks. > > """ > find(...) > S.find(sub [,start [,end]]) -> int > > Return the lowest index in S where substring sub is found, > such that sub is contained within s[start:end]. Optional > arguments start and end are interpreted as in slice notation. > > Return -1 on failure. > """ > > a.find(b) returns -1 if b is no found. -1 evaluates to True in a boolean > context. > > Use > > [line for line in open(...) if line.find("Haskell") != -1] > > or, better > > [line for line in open(...) if "Haskell" in line] > > to get the expected result. > > Peter
Or better, group them together in a generator: sum(line for line in open(...) if "Haskell" in line) and avoid allocating a new list with every line that contains Haskell in it. http://www.python.org/dev/peps/pep-0289/ -- http://mail.python.org/mailman/listinfo/python-list