Latest approach to controlling non-printable / multi-byte characters

metaperl Thu, 08 Feb 2007 12:26:10 -0800

There is no end to the number of frantic pleas for help with
characters in the realm beyond ASCII.


However, in searching thru them, I do not see a workable approach to
changing them into other things.

I am dealing with a file and in my Emacs editor, I see "MASSACHUSETTS-
AMHERST" ... in other words, there is a dash between MASSACHUSETTS and
AMHERST.

However, if I do a grep for the text the shell returns this:

MASSACHUSETTSâ&#128;&#147;AMHERST

and od -tc returns this:

0000540        O   F       M   A   S   S   A   C   H   U   S   E   T
T
0000560    S 342 200 223   A   M   H   E   R   S   T   ;       U   N
I


So, the conclusion is the "dash" is actually 3 octal characters. My
goal is to take those 3 octal characters and convert them to an ascii
dash. Any idea how I might write such a filter? The closest I have got
it:

unicodedata.normalize('NFKD', s).encode('ASCII', 'replace')

but that puts a question mark there.

-- 
http://mail.python.org/mailman/listinfo/python-list

Latest approach to controlling non-printable / multi-byte characters

Reply via email to