david.gar...@gmail.com wrote:
I am looking for the fastest way to parse a log file.
currently I have this... Can I speed this up any? The script is
written to be a generic log file parser so I can't rely on some
predictable pattern.
def check_data(data,keywords):
#get rid of duplicates
unique_list = list(set(data))
string_list=' '.join(unique_list)
#print string_list
for keyword in keywords:
if keyword in string_list:
return True
I am currently using file seek and maintaining a last byte count file:
with open(filename) as f:
print "Here is filename:%s" %filename
f.seek(0, 2)
eof = f.tell()
print "Here is eof:%s" %eof
if last is not None:
print "Here is last:%s" %last
# if last is less than current
last = int(last)
if (eof - last > 0):
offset = eof - last
offset = offset * -1
print "Here is new offset:%s" %offset
f.seek(offset, 2)
mylist = f.readlines()
else:
# if last doesn't exist or is greater than current
f.seek(0)
bof = f.tell()
print "Here is bof:%s" %bof
mylist = f.readlines()
Thanks,
--
David Garvey
I have a log parser that take action upon some log patterns.
I rely on my system 'grep' program to do the hard work, i.e. find
occurrences
Of course that means it is system dependant, but I don't think you can
beat grep's speed.
def _grep(self, link, pattern):
# return the number of occurences for the pattern in the file
proc = subprocess.Popen(['grep', '-c', pattern, link],
stdout=subprocess.PIPE)
return int(proc.communicate()[0])
Cheers,
JM
--
http://mail.python.org/mailman/listinfo/python-list