In article <[EMAIL PROTECTED]>,
David Pratt <[EMAIL PROTECTED]> wrote:
> This is very nice :-) Thank you Tony. I think this will be the way to
> go. My concern ATM is where it will be best to unicode. The data after
> this will go into dict and a few processes and into database. Because
David Pratt wrote:
> I am working with a text format that advises to strip any ascii control
> characters (0 - 30) as part of parsing data and also the ascii pipe
> character (124) from the data. I think many of these characters are
> from a different time. Since I have never seen most of these
This is very nice :-) Thank you Tony. I think this will be the way to
go. My concern ATM is where it will be best to unicode. The data after
this will go into dict and a few processes and into database. Because
input source if not explicit encoding, I will have to assume ISO-8859-1
I bel
Hi Steve. My plan is to parse the data removing the control characters
and validate to data as records are being added to a dictionary. I am
going to Unicode after this step but before it gets into storage (in
which case I think the translate method could work well).
The encoding itself is
In article <[EMAIL PROTECTED]>,
David Pratt <[EMAIL PROTECTED]> wrote:
> I am working with a text format that advises to strip any ascii control
> characters (0 - 30) as part of parsing data and also the ascii pipe
> character (124) from the data. I think many of these characters are
> from a
David Pratt wrote:
[about ord(), chr() and stripping control characters]
> Many thanks Steve. This is good information. I think this should work
> fine. I was doing a string.replace in a cleanData() method with the
> following characters but don't know if that would have done it. This
> contains
Many thanks Steve. This is good information. I think this should work
fine. I was doing a string.replace in a cleanData() method with the
following characters but don't know if that would have done it. This
contains all the control characters that I really know about in normal
use. ord(c) < 32
David Pratt wrote:
> I am working with a text format that advises to strip any ascii control
> characters (0 - 30) as part of parsing data and also the ascii pipe
> character (124) from the data. I think many of these characters are
> from a different time. Since I have never seen most of these
I am working with a text format that advises to strip any ascii control
characters (0 - 30) as part of parsing data and also the ascii pipe
character (124) from the data. I think many of these characters are
from a different time. Since I have never seen most of these characters
in text I am no