Re: Stripping ASCII codes when parsing

2005-10-17 Thread Tony Nelson
In article <[EMAIL PROTECTED]>, David Pratt <[EMAIL PROTECTED]> wrote: > This is very nice :-) Thank you Tony. I think this will be the way to > go. My concern ATM is where it will be best to unicode. The data after > this will go into dict and a few processes and into database. Because

Re: Stripping ASCII codes when parsing

2005-10-17 Thread Erik Max Francis
David Pratt wrote: > I am working with a text format that advises to strip any ascii control > characters (0 - 30) as part of parsing data and also the ascii pipe > character (124) from the data. I think many of these characters are > from a different time. Since I have never seen most of these

Re: Stripping ASCII codes when parsing

2005-10-17 Thread David Pratt
This is very nice :-) Thank you Tony. I think this will be the way to go. My concern ATM is where it will be best to unicode. The data after this will go into dict and a few processes and into database. Because input source if not explicit encoding, I will have to assume ISO-8859-1 I bel

Re: Stripping ASCII codes when parsing

2005-10-17 Thread David Pratt
Hi Steve. My plan is to parse the data removing the control characters and validate to data as records are being added to a dictionary. I am going to Unicode after this step but before it gets into storage (in which case I think the translate method could work well). The encoding itself is

Re: Stripping ASCII codes when parsing

2005-10-17 Thread Tony Nelson
In article <[EMAIL PROTECTED]>, David Pratt <[EMAIL PROTECTED]> wrote: > I am working with a text format that advises to strip any ascii control > characters (0 - 30) as part of parsing data and also the ascii pipe > character (124) from the data. I think many of these characters are > from a

Re: Stripping ASCII codes when parsing

2005-10-17 Thread Steve Holden
David Pratt wrote: [about ord(), chr() and stripping control characters] > Many thanks Steve. This is good information. I think this should work > fine. I was doing a string.replace in a cleanData() method with the > following characters but don't know if that would have done it. This > contains

Re: Stripping ASCII codes when parsing

2005-10-17 Thread David Pratt
Many thanks Steve. This is good information. I think this should work fine. I was doing a string.replace in a cleanData() method with the following characters but don't know if that would have done it. This contains all the control characters that I really know about in normal use. ord(c) < 32

Re: Stripping ASCII codes when parsing

2005-10-17 Thread Steve Holden
David Pratt wrote: > I am working with a text format that advises to strip any ascii control > characters (0 - 30) as part of parsing data and also the ascii pipe > character (124) from the data. I think many of these characters are > from a different time. Since I have never seen most of these

Stripping ASCII codes when parsing

2005-10-17 Thread David Pratt
I am working with a text format that advises to strip any ascii control characters (0 - 30) as part of parsing data and also the ascii pipe character (124) from the data. I think many of these characters are from a different time. Since I have never seen most of these characters in text I am no