I am currently using the following technic to get the info above: all = defaultdict(int) hosts = defaultdict(int) filename = defaultdict(int)
for r in log: all[r['host'],r['file']] += 1 hosts[r['host']] += 1 filename[r['file']] = 1 for host in sorted(hosts,key=hosts.get, reverse=True): for file in filename: print host, all[host,file] print hosts[host] I was looking for a better option instead of building 'three' collections to improve performance. - Jo On Wed, Oct 8, 2008 at 2:15 PM, Joe Riopel <[EMAIL PROTECTED]> wrote: > On Wed, Oct 8, 2008 at 1:55 PM, Joe Python <[EMAIL PROTECTED]> wrote: > > I want to find the top '100' hosts (sorted in descending order of total > > requests) like follows: > > Is there a fast way to this without scanning the log file many times? > > As you encounter a new "host" add it to a dict (or another type of > collection), and if encountered again, use that "host" as the key to > retrieve the dict entry and increment it's request count. You should > only have to read the file once. >
-- http://mail.python.org/mailman/listinfo/python-list