I'm processing a lot of dirty CSV files and would like to track the bad codes that are raising UnicodeErrors. I'm struggling how to figure out what the exact codes are so I can track them, them remove them, and then repeat the decoding process for the current line until the line has been fully decoded so I can pass this line on to the CSV reader. At a high level it seems that I need to wrap the decoding of a line until it passes with out any errors. Any suggestions appreciated.
Thank you, Malcolm -- https://mail.python.org/mailman/listinfo/python-list