Apache log munging

2008-10-08 Thread Joe Python
I have a written a generator for an apache log which returns two types of
information,
hostname and the filename requested.

The 'log' generator can be 'consumed' like this:

for r in log:
  print r['host'], r['filename']

I want to find the top '100' hosts (sorted in descending order of total
requests) like follows:

host  filename1  filename2 filename3 Total

hostA   6  9 45 110
hostC   4 4343  98
hostB   344 45  83

and so on.
Is there a fast way to this without scanning the log file many times?
Thanks in advance.
- Jo
--
http://mail.python.org/mailman/listinfo/python-list


Re: Apache log munging

2008-10-08 Thread Joe Python
I am currently using the following technic to get the info above:

all = defaultdict(int)
hosts = defaultdict(int)
filename = defaultdict(int)

for r in log:
   all[r['host'],r['file']] += 1
   hosts[r['host']] += 1
   filename[r['file']] = 1


for host in sorted(hosts,key=hosts.get, reverse=True):
for file in filename:
  print host, all[host,file]
print hosts[host]
I was looking for a better option instead of building 'three' collections
to improve performance.

- Jo

On Wed, Oct 8, 2008 at 2:15 PM, Joe Riopel <[EMAIL PROTECTED]> wrote:

> On Wed, Oct 8, 2008 at 1:55 PM, Joe Python <[EMAIL PROTECTED]> wrote:
> > I want to find the top '100' hosts (sorted in descending order of total
> > requests) like follows:
> > Is there a fast way to this without scanning the log file many times?
>
> As you encounter a new "host" add it to a dict (or another type of
> collection), and if encountered again, use that "host" as the key to
> retrieve the dict entry and increment it's request count. You should
> only have to read the file once.
>
--
http://mail.python.org/mailman/listinfo/python-list


splitting a string into an array using a time value

2008-10-14 Thread Joe Python
I want to find a way to split a string into an array using a time value.
s = r"""
  8/25/2008 11:10:08 AM  Lorem ipsum dolor sit amet, consectetuer
adipiscing elit. Sed imperdiet luctus nisl.
  ipsum vel arcu gravida mattis. In mattis dolor id sem. Praesent dictum
tortor non lacus.  0/3/2008 5:10:23 PM
  ras quis ante id lacus sodales accumsan. Morbi bibendum iaculis purus
10/6/2008 4:39:55 PM Maecenas lectus libero,
  tincidunt sed
  """
I am looking for an output in the form of an array as follows:

resulting-array = [ 8/25/2008 11:10:08 AM  Lorem ipsum dolor sit amet,
consectetuer adipiscing elit. Sed imperdiet luctus nisl.
  ipsum vel arcu gravida mattis. In mattis dolor id sem. Praesent dictum
tortor non lacus.,

 0/3/2008 5:10:23 PM   ras quis ante id lacus sodales accumsan.
Morbi bibendum iaculis purus,

 10/6/2008 4:39:55 PM Maecenas lectus libero,   tincidunt sed ]

 Note: there is an element corresponding to each time entry in the array

I tried to use the pattern but its not working:
 pattern = r'(\d+/\d+/\d+ \d+:\d+:\d+ .+)'
 pat = re.compile(pattern)
     result = re.split(pat,s)

- Joe Python
--
http://mail.python.org/mailman/listinfo/python-list