> > The simplest. That would be #3, cleaned up a bit: > > from collections import defaultdict > from csv import DictReader > from pprint import pprint > from operator import itemgetter > > def rows(filename): > infile = open(filename, "rb") > for row in DictReader(infile): > yield row["CATEGORIES"] > > def stats(values): > histo = defaultdict(int) > for v in values: > histo[v] += 1 > return sorted(histo.iteritems(), key=itemgetter(1), reverse=True) > > Should you need the inner dict (which doesn't seem to offer any additional > information) you can always add another step: > > def format(items): > result = [] > for raw, count in items: > leaf = raw.rpartition("|")[2] > result.append((raw, dict(count=count, leaf=leaf))) > return result > > pprint(format(stats(rows("sampledata.csv"))), indent=4, width=60) > > By the way, if you had broken the problem in steps like above you could have > offered four different stats() functions which would would have been a bit > easier to read... >
Thank you. The code reorganization does make make it easer to read. I'll have to look up the docs on itemgetter() :) -- http://mail.python.org/mailman/listinfo/python-list