I am currently using the following technic to get the info above:

all = defaultdict(int)
hosts = defaultdict(int)
filename = defaultdict(int)

for r in log:
   all[r['host'],r['file']] += 1
   hosts[r['host']] += 1
   filename[r['file']] = 1


for host in sorted(hosts,key=hosts.get, reverse=True):
    for file in filename:
      print host, all[host,file]
    print hosts[host]
I was looking for a better option instead of building 'three' collections
to improve performance.

- Jo

On Wed, Oct 8, 2008 at 2:15 PM, Joe Riopel <[EMAIL PROTECTED]> wrote:

> On Wed, Oct 8, 2008 at 1:55 PM, Joe Python <[EMAIL PROTECTED]> wrote:
> > I want to find the top '100' hosts (sorted in descending order of total
> > requests) like follows:
> > Is there a fast way to this without scanning the log file many times?
>
> As you encounter a new "host" add it to a dict (or another type of
> collection), and if encountered again, use that "host" as the key to
> retrieve the dict entry and increment it's request count. You should
> only have to read the file once.
>
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to