On Dec 2, 11:46 pm, Michael Spencer <[EMAIL PROTECTED]> wrote: > Michael Goerz wrote: > > Hi, > > > I am writing unicode stings into a special text file that requires to > > have non-ascii characters as as octal-escaped UTF-8 codes. > > > For example, the letter "Í" (latin capital I with acute, code point 205) > > would come out as "\303\215". > > > I will also have to read back from the file later on and convert the > > escaped characters back into a unicode string. > > > Does anyone have any suggestions on how to go from "Í" to "\303\215" and > > vice versa? > > Perhaps something along the lines of: > > >>> def encode(source): > ... return "".join("\%o" % ord(c) for c in source.encode('utf8')) > ... > >>> def decode(encoded): > ... bytes = "".join(chr(int(c, 8)) for c in encoded.split('\\')[1:]) > ... return bytes.decode('utf8') > ... > >>> encode(u"Í") > '\\303\\215' > >>> print decode(_) > Í > >>> > > HTH > Michael
Nice one. :) If I might suggest a slight variation to handle cases where the "encoded" string contains plain text as well as octal escapes... def decode(encoded): for octc in (c for c in re.findall(r'\\(\d{3})', encoded)): encoded = encoded.replace(r'\%s' % octc, chr(int(octc, 8))) return encoded.decode('utf8') This way it can handle both "\\141\\144\\146\\303\\215\\141\\144\\146" as well as "adf\\303\\215adf". Regards, Jordan -- http://mail.python.org/mailman/listinfo/python-list