Re: Stripping ASCII codes when parsing

David Pratt Mon, 17 Oct 2005 10:51:26 -0700

This is very nice :-)  Thank you Tony.  I think this will be the way to  
go.  My concern ATM is where it will be best to unicode. The data after  
this will go into dict and a few processes and into database. Because  
input source if not explicit encoding, I will have to assume ISO-8859-1  
I believe but could well be cp1252 for most part ( because it says no  
ASCII (0-30) but alright ASCII chars 128-254) and because most are  
Windows users.  Am thinking to unicode after stripping these characters  
and validating text, then unicoding (utf-8) so it is unicode in dict.  
Then when I perform these other processes it should be uniform and then  
it will go into database as unicode.  I think this should be ok.


Regards,
David

On Monday, October 17, 2005, at 01:48 PM, Tony Nelson wrote:

> In article <[EMAIL PROTECTED]>,
>  David Pratt <[EMAIL PROTECTED]> wrote:
>
>> I am working with a text format that advises to strip any ascii  
>> control
>> characters (0 - 30) as part of parsing data and also the ascii pipe
>> character (124) from the data. I think many of these characters are
>> from a different time. Since I have never seen most of these  
>> characters
>> in text I am not sure how these first 30 control characters are all
>> represented (other than say tab (\t), newline(\n), line return(\r) )  
>> so
>> what should I do to remove these characters if they are ever
>> encountered. Many thanks.
>
> Most of those characters are hard to see.
>
> Represent arbitrary characters in a string in hex: "\x00\x01\x02" or
> with chr(n).
>
> If you just want to remove some characters, look into "".translate().
>
> nullxlate = "".join([chr(n) for n in xrange(256)])
> delchars = nullxlate[:31] + chr(124)
> outputstr = inputstr.translate(nullxlate, delchars)
> _______________________________________________________________________ 
> _
> TonyN.:'                         
> [EMAIL PROTECTED]
>       '                                   
> <http://www.georgeanelson.com/>
> -- 
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Stripping ASCII codes when parsing

Reply via email to