Re: Capturing the bad codes that raise UnicodeError exceptions during decoding

2016-08-06 Thread Matt Ruffalo
On 2016-08-04 15:45, Random832 wrote: > On Thu, Aug 4, 2016, at 15:22, Malcolm Greene wrote: >> Hi Chris, >> >> Thanks for your suggestions. I would like to capture the specific bad >> codes *before* they get replaced. So if a line of text has 10 bad codes >> (each one raising UnicodeError), I woul

Re: Capturing the bad codes that raise UnicodeError exceptions during decoding

2016-08-04 Thread Malcolm Greene
Wow!!! A huge thank you to all who replied to this thread! Chris: You gave me some ideas I will apply in the future. MRAB: Thanks for exposing me to the extended attributes of the UnicodeError object (e.start, e.end, e.object). Mike: Cool example! I like how _cleanlines() recursively calls itse

Re: Capturing the bad codes that raise UnicodeError exceptions during decoding

2016-08-04 Thread Michael Selik
On Thu, Aug 4, 2016 at 3:24 PM Malcolm Greene wrote: > Hi Chris, > > Thanks for your suggestions. I would like to capture the specific bad > codes *before* they get replaced. So if a line of text has 10 bad codes > (each one raising UnicodeError), I would like to track each exception's > bad code

Re: Capturing the bad codes that raise UnicodeError exceptions during decoding

2016-08-04 Thread Chris Angelico
On Fri, Aug 5, 2016 at 5:22 AM, Malcolm Greene wrote: > Thanks for your suggestions. I would like to capture the specific bad > codes *before* they get replaced. So if a line of text has 10 bad codes > (each one raising UnicodeError), I would like to track each exception's > bad code but still ret

Re: Capturing the bad codes that raise UnicodeError exceptions during decoding

2016-08-04 Thread Random832
On Thu, Aug 4, 2016, at 15:22, Malcolm Greene wrote: > Hi Chris, > > Thanks for your suggestions. I would like to capture the specific bad > codes *before* they get replaced. So if a line of text has 10 bad codes > (each one raising UnicodeError), I would like to track each exception's > bad code

Re: Capturing the bad codes that raise UnicodeError exceptions during decoding

2016-08-04 Thread MRAB
On 2016-08-04 20:22, Malcolm Greene wrote: Hi Chris, Thanks for your suggestions. I would like to capture the specific bad codes *before* they get replaced. So if a line of text has 10 bad codes (each one raising UnicodeError), I would like to track each exception's bad code but still return a v

Re: Capturing the bad codes that raise UnicodeError exceptions during decoding

2016-08-04 Thread Malcolm Greene
Hi Chris, Thanks for your suggestions. I would like to capture the specific bad codes *before* they get replaced. So if a line of text has 10 bad codes (each one raising UnicodeError), I would like to track each exception's bad code but still return a valid decode line when finished. My goal is

Re: Capturing the bad codes that raise UnicodeError exceptions during decoding

2016-08-04 Thread Chris Angelico
On Fri, Aug 5, 2016 at 4:47 AM, Malcolm Greene wrote: > I'm processing a lot of dirty CSV files and would like to track the bad > codes that are raising UnicodeErrors. I'm struggling how to figure out > what the exact codes are so I can track them, them remove them, and then > repeat the decoding

Capturing the bad codes that raise UnicodeError exceptions during decoding

2016-08-04 Thread Malcolm Greene
I'm processing a lot of dirty CSV files and would like to track the bad codes that are raising UnicodeErrors. I'm struggling how to figure out what the exact codes are so I can track them, them remove them, and then repeat the decoding process for the current line until the line has been fully deco