On Thu, 13 Sep 2012 21:34:52 -0500, Tim Chase wrote: > On 09/13/12 21:09, Mark Tolonen wrote: >> On Thursday, September 13, 2012 4:53:13 PM UTC-7, Tim Chase wrote: >>> Vlastimil's solution kept the characters but stripped them of their >>> accents/tildes/cedillas/etc, doing just what I wanted, all using the >>> stdlib. Hard to do better than that :-) >> >> How about using UTF-7 for transmission and decode on the other end? >> This keeps the transmission all 7-bit, and no loss. >> >> >>> s=u"serviço móvil".encode('utf-7') >> >>> print s >> servi+AOc-o m+APM-vil >> >>> print s.decode('utf-7') >> serviço móvil > > Nice if I control both ends of the pipe. Unfortunately, I only control > what goes in, and I want it to be as un-screw-uppable as possible when > it comes out the other end (may be web, CSV files, PDFs, FTP'ed file > dumps, spreadsheets, word-processing documents, etc), and us-ascii is > the lowest-common-denominator of unscrewuppableness while requiring > nothing of the the other end. :-)
Wrong. It requires support for US-ASCII. What if the other end is an IBM mainframe using EBCDIC? Frankly, I am appalled that you are intentionally perpetuating the ignorance of US-ASCII-only applications, not because you have no choice about inter-operating with some ancient, brain-dead application, but because you artificially choose to follow an obsolete *and incorrect* standard. It is *incorrect* because you can change the meaning of text by stripping accents and deleting characters. Consequences can include murder and suicide: http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail At least tell me that "ASCII only" is merely an *option* for your application, not the only choice, and that it defaults to UTF-8 which is the right standard to use for text. -- Steven -- http://mail.python.org/mailman/listinfo/python-list