Bernard Rankin wrote: > I've got several versions of code to here to generate a histogram-esque > structure from rows in a CSV file. > > The basic approach is to use a Dict as a bucket collection to count > instances of data items. > > Other than the try/except(KeyError) idiom for dealing with new bucket > names, which I don't like as it desribes the initial state of a KeyValue > _after_ you've just described what to do with the existing value, I've > come up with a few other methods. > > What seems like to most resonable approuch?
The simplest. That would be #3, cleaned up a bit: from collections import defaultdict from csv import DictReader from pprint import pprint from operator import itemgetter def rows(filename): infile = open(filename, "rb") for row in DictReader(infile): yield row["CATEGORIES"] def stats(values): histo = defaultdict(int) for v in values: histo[v] += 1 return sorted(histo.iteritems(), key=itemgetter(1), reverse=True) Should you need the inner dict (which doesn't seem to offer any additional information) you can always add another step: def format(items): result = [] for raw, count in items: leaf = raw.rpartition("|")[2] result.append((raw, dict(count=count, leaf=leaf))) return result pprint(format(stats(rows("sampledata.csv"))), indent=4, width=60) By the way, if you had broken the problem in steps like above you could have offered four different stats() functions which would would have been a bit easier to read... Peter -- http://mail.python.org/mailman/listinfo/python-list