Apache log munging
I have a written a generator for an apache log which returns two types of information, hostname and the filename requested. The 'log' generator can be 'consumed' like this: for r in log: print r['host'], r['filename'] I want to find the top '100' hosts (sorted in descending order of total requests) like follows: host filename1 filename2 filename3 Total hostA 6 9 45 110 hostC 4 4343 98 hostB 344 45 83 and so on. Is there a fast way to this without scanning the log file many times? Thanks in advance. - Jo -- http://mail.python.org/mailman/listinfo/python-list
Re: Apache log munging
I am currently using the following technic to get the info above: all = defaultdict(int) hosts = defaultdict(int) filename = defaultdict(int) for r in log: all[r['host'],r['file']] += 1 hosts[r['host']] += 1 filename[r['file']] = 1 for host in sorted(hosts,key=hosts.get, reverse=True): for file in filename: print host, all[host,file] print hosts[host] I was looking for a better option instead of building 'three' collections to improve performance. - Jo On Wed, Oct 8, 2008 at 2:15 PM, Joe Riopel <[EMAIL PROTECTED]> wrote: > On Wed, Oct 8, 2008 at 1:55 PM, Joe Python <[EMAIL PROTECTED]> wrote: > > I want to find the top '100' hosts (sorted in descending order of total > > requests) like follows: > > Is there a fast way to this without scanning the log file many times? > > As you encounter a new "host" add it to a dict (or another type of > collection), and if encountered again, use that "host" as the key to > retrieve the dict entry and increment it's request count. You should > only have to read the file once. > -- http://mail.python.org/mailman/listinfo/python-list
splitting a string into an array using a time value
I want to find a way to split a string into an array using a time value. s = r""" 8/25/2008 11:10:08 AM Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Sed imperdiet luctus nisl. ipsum vel arcu gravida mattis. In mattis dolor id sem. Praesent dictum tortor non lacus. 0/3/2008 5:10:23 PM ras quis ante id lacus sodales accumsan. Morbi bibendum iaculis purus 10/6/2008 4:39:55 PM Maecenas lectus libero, tincidunt sed """ I am looking for an output in the form of an array as follows: resulting-array = [ 8/25/2008 11:10:08 AM Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Sed imperdiet luctus nisl. ipsum vel arcu gravida mattis. In mattis dolor id sem. Praesent dictum tortor non lacus., 0/3/2008 5:10:23 PM ras quis ante id lacus sodales accumsan. Morbi bibendum iaculis purus, 10/6/2008 4:39:55 PM Maecenas lectus libero, tincidunt sed ] Note: there is an element corresponding to each time entry in the array I tried to use the pattern but its not working: pattern = r'(\d+/\d+/\d+ \d+:\d+:\d+ .+)' pat = re.compile(pattern) result = re.split(pat,s) - Joe Python -- http://mail.python.org/mailman/listinfo/python-list