On Apr 3, 8:06 am, Mag Gam <magaw...@gmail.com> wrote: > Thanks for the responses. > > Basically, I have a large file with this format, > > Date INFO username command srcipaddress filename > > I would like to do statistics on: > total number of usernames and who they are > username and commands > username and filenames > unique source ip addresses > unique filenames > > Then I would like to bucket findings with days (date). > > Overall, I would like to build a log file analyzer. > > > > > > > > On Sat, Apr 2, 2011 at 10:59 PM, Dan Stromberg <drsali...@gmail.com> wrote: > > > On Sat, Apr 2, 2011 at 5:24 PM, Chris Angelico <ros...@gmail.com> wrote: > > >> On Sun, Apr 3, 2011 at 9:58 AM, Mag Gam <magaw...@gmail.com> wrote: > >> > I suppose I can do something like this. > >> > (pseudocode) > > >> > d={} > >> > try: > >> > d[key]+=1 > >> > except KeyError: > >> > d[key]=1 > > >> > I was wondering if there is a pythonic way of doing this? I plan on > >> > doing this many times for various files. Would the python collections > >> > class be sufficient? > > >> I think you want collections.Counter. From the docs: "Counter objects > >> have a dictionary interface except that they return a zero count for > >> missing items instead of raising a KeyError". > > >> ChrisA > > > I realize you (Mag) asked for a Python solution, but since you mention > > awk... you can also do this with "sort < input | uniq -c" - one line of > > "code". GNU sort doesn't use as nice an algorithm as a hashing-based > > solution (like you'd probably use with Python), but for a sort, GNU sort's > > quite good. > > > -- > >http://mail.python.org/mailman/listinfo/python-list
Take a look at: http://code.activestate.com/recipes/577535-aggregates-using-groupby-defaultdict-and-counter/ for some ideas of how to group and count things. -- http://mail.python.org/mailman/listinfo/python-list