Wow!!! A huge thank you to all who replied to this thread!

Chris: You gave me some ideas I will apply in the future.

MRAB: Thanks for exposing me to the extended attributes of the UnicodeError 
object (e.start, e.end, e.object).

Mike: Cool example! I like how _cleanlines() recursively calls itself to keep 
cleaning up a line after an error is handled. Your code solved the mystery of 
how to recover from a UnicodeError and keep decoding.

Random832: Your suggestion to write a custom codecs handler was great. Sample 
below for future readers reviewing this thread.

# simple codecs custom error handler
import codecs

def custom_unicode_error_handler(e):
     bad_bytes = e.object[e.start:e.end]
     print( 'Bad bytes: ' + bad_bytes.hex())
     return ('<?>', e.end)

codecs.register_error('custom_unicode_error_handler',
custom_unicode_error_handler)

Malcolm

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to