On 2013-04-23 13:36, Neil Cerutti wrote: > On 2013-04-22, Colin J. Williams <c...@ncf.ca> wrote: > > Since I'm only interested in one or two columns, the simpler > > approach is probably better. > > Here's a sketch of how one of my projects handles that situation. > I think the index variables are invaluable documentation, and > make it a bit more robust. (Python 3, so not every bit is > relevant to you). > > with open("today.csv", encoding='UTF-8', newline='') as today_file: > reader = csv.reader(today_file) > header = next(reader) > majr_index = header.index('MAJR') > div_index = header.index('DIV') > for rec in reader: > major = rec[majr_index] > rec[div_index] = DIVISION_TABLE[major] > > But a csv.DictReader might still be more efficient. I never > tested. This is the only place I've used this "optimization". > It's fast enough. ;)
I believe the csv module does all the work at c-level, rather than as pure Python, so it should be notably faster. The only times I've had to do things by hand like that are when there are header peculiarities that I can't control, such as mismatched case or added/remove punctuation (client files are notorious for this). So I often end up doing something like def normalize(header): return header.strip().upper() # other cleanup as needed reader = csv.reader(f) headers = next(reader) header_map = dict( (normalize(header), i) for i, header in enumerate(headers) ) item = lambda col: row[header_map[col]].strip() for row in reader: major = item("MAJR").upper() division = item("DIV") # ... The function calling might add overhead (in which case one could just use explicit indirect indexing for each value assignment: major = row[header_map["MAJR"]].strip().upper() but I usually find that processing CSV files leaves me I/O bound rather than CPU bound. -tkc -- http://mail.python.org/mailman/listinfo/python-list