James Stroud wrote: > Frank Millman wrote: > >> Hi all >> >> This is probably old hat to most of you, but for me it was a >> revelation, so I thought I would share it in case someone has a similar >> requirement. >> >> I had to convert an old program that does a traditional pass through a >> sorted data file, breaking on a change of certain fields, processing >> each row, accumulating various totals, and doing additional processing >> at each break. I am not using a database for this one, as the file >> sizes are not large - a few thousand rows at most. I am using csv >> files, and using the csv module so that each row is nicely formatted >> into a list. >> >> The traditional approach is quite fiddly, saving the values of the >> various break fields, comparing the values on each row with the saved >> values, and taking action if the values differ. The more break fields >> there are, the fiddlier it gets. >> >> I was going to do the same in python, but then I vaguely remembered >> reading about 'groupby'. It took a little while to figure it out, but >> once I had cracked it, it transformed the task into one of utter >> simplicity. >> >> Here is an example. Imagine a transaction file sorted by branch, >> account number, and date, and you want to break on all three. >> >> ----------------------------- >> import csv >> from itertools import groupby >> from operator import itemgetter >> >> BRN = 0 >> ACC = 1 >> DATE = 2 >> >> reader = csv.reader(open('trans.csv', 'rb')) >> rows = [] >> for row in reader: >> rows.append(row) >> >> for brn,brnList in groupby(rows,itemgetter(BRN)): >> for acc,accList in groupby(brnList,itemgetter(ACC)): >> for date,dateList in groupby(accList,itemgetter(DATE)): >> for row in dateList: >> [do something with row] >> [do something on change of date] >> [do something on change of acc] >> [do something on change of brn] >> ----------------------------- >> >> Hope someone finds this of interest. >> >> Frank Millman >> > > I'm sure I'm going to get a lot of flac on this list for proposing to > turn nested for-loops into a recursive function, but I couldn't help > myself. This seems more simple to me, but for others it may be difficult > to look at, and these people will undoubtedly complain. > > > import csv > from itertools import groupby > from operator import itemgetter > > reader = csv.reader(open('trans.csv', 'rb')) > rows = [] > for row in reader: > rows.append(row) > > def brn_doer(row): > [doing something with brn here] > > def acc_doer(date): > [you get the idea] > > [etc.] > > doers = [brn_doer, acc_doer, date_doer, row_doer] > > def doit(rows, doers, i=0): > for r, alist in groupby(rows, itemgetter(i)): > doit(alist, doers[1:], i+1) > doers[0](r) > > doit(rows, doers, 0) > > Now all of those ugly for loops become one recursive function. Bear in > mind, its not all that 'elegant', but it looks nicer, is more succinct, > abstracts the process, and scales to arbitrary depth. Tragically, > however, it has been generalized, which is likely to raise some hackles > here. And, oh yes, it didn't answer exactly your question (which you > didn't really have). I'm sure I will regret this becuase, as you will > find, suggesting code on this list with additional utility is somewhat > discouraged by the vociferous few who make a religion out of 'import this'. > > Also, I still have no idea what 'groupby' does. It looks interesting > thgough, thanks for pointing it out. > > James >
Forgot to test for stopping condition: def doit(rows, doers, i=0): for r, alist in groupby(rows, itemgetter(i)): if len(doers) > 1: doit(alist, doers[1:], i+1) doers[0](r) -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com/ -- http://mail.python.org/mailman/listinfo/python-list