On Fri, Aug 5, 2016 at 5:22 AM, Malcolm Greene <pyt...@bdurham.com> wrote: > Thanks for your suggestions. I would like to capture the specific bad > codes *before* they get replaced. So if a line of text has 10 bad codes > (each one raising UnicodeError), I would like to track each exception's > bad code but still return a valid decode line when finished. >
Interesting. Sounds to me like the simplest option is to open the file as binary, split it on b"\n", and decode line by line before giving it to the csv module. The csv.reader "csvfile" argument doesn't actually have to be a file - it can be anything that yields lines. So you can put a generator in between, like this: def decode(binary): for line in binary: try: yield line.decode("utf-8") except UnicodeDecodeError: log_stats() def read_dirty_file(fn): with open(fn, "rb") as f: for row in csv.DictReader(decode(f)): process(row) Or what Random said, which is also viable. ChrisA -- https://mail.python.org/mailman/listinfo/python-list