There is no end to the number of frantic pleas for help with characters in the realm beyond ASCII.
However, in searching thru them, I do not see a workable approach to changing them into other things. I am dealing with a file and in my Emacs editor, I see "MASSACHUSETTS- AMHERST" ... in other words, there is a dash between MASSACHUSETTS and AMHERST. However, if I do a grep for the text the shell returns this: MASSACHUSETTS–AMHERST and od -tc returns this: 0000540 O F M A S S A C H U S E T T 0000560 S 342 200 223 A M H E R S T ; U N I So, the conclusion is the "dash" is actually 3 octal characters. My goal is to take those 3 octal characters and convert them to an ascii dash. Any idea how I might write such a filter? The closest I have got it: unicodedata.normalize('NFKD', s).encode('ASCII', 'replace') but that puts a question mark there. -- http://mail.python.org/mailman/listinfo/python-list