Tim wrote: > Csv is a very common format for publishing data as a form of primitive > integration. It's an annoyingly brittle approach, so I'd like to > ensure that I capture errors as soon as possible, so that I can get > the upstream processes fixed, or at worst put in some correction > mechanisms and avoid getting polluted data into my analyses. > > A symptom of several types of errors is that the number of fields > being interpreted varies over a file (eg from wrongly embedded quote > strings or mishandled embedded newlines). My preferred approach would > be to get DictReader to throw an exception when encountering such > oddities, but at the moment it seems to try to patch over the error > and fill in the blanks for short lines, or ignore long lines. I know > that I can use the restval parameter and then check for what's been > parsed when I get my results back, but this seems brittle as whatever > I use for restval could legitimately be in the data. > > Is there any way to get csv.DictReader to throw and exception on such > simple line errors, or am I going to have to use csv.reader and > explicitly check for the number of fields read in on each line?
I think you have to use csv.reader. Untested: def DictReader(f, fieldnames=None, *args, **kw): reader = csv.reader(f, *args, **kw) if fieldnames is None: fieldnames = next(reader) for row in reader: if row: if len(fieldnames) != len(row): raise ValueError yield dict(zip(fieldnames, row)) Peter -- http://mail.python.org/mailman/listinfo/python-list