On Dec 2, 8:38 pm, Michael Goerz <[EMAIL PROTECTED]> wrote: > Michael Goerz wrote: > > Hi, > > > I am writing unicode stings into a special text file that requires to > > have non-ascii characters as as octal-escaped UTF-8 codes. > > > For example, the letter "Í" (latin capital I with acute, code point 205) > > would come out as "\303\215". > > > I will also have to read back from the file later on and convert the > > escaped characters back into a unicode string. > > > Does anyone have any suggestions on how to go from "Í" to "\303\215" and > > vice versa? > > > I know I can get the code point by doing > >>>> "Í".decode('utf-8').encode('unicode_escape') > > but there doesn't seem to be any similar method for getting the octal > > escaped version. > > > Thanks, > > Michael > > I've come up with the following solution. It's not very pretty, but it > works (no bugs, I hope). Can anyone think of a better way to do it? > > Michael > _________ > > import binascii > > def escape(s): > hexstring = binascii.b2a_hex(s) > result = "" > while len(hexstring) > 0: > (hexbyte, hexstring) = (hexstring[:2], hexstring[2:]) > octbyte = oct(int(hexbyte, 16)).zfill(3) > result += "\\" + octbyte[-3:] > return result > > def unescape(s): > result = "" > while len(s) > 0: > if s[0] == "\\": > (octbyte, s) = (s[1:4], s[4:]) > try: > result += chr(int(octbyte, 8)) > except ValueError: > result += "\\" > s = octbyte + s > else: > result += s[0] > s = s[1:] > return result > > print escape("\303\215") > print unescape('adf\\303\\215adf')
Looks like escape() can be a bit simpler... def escape(s): result = [] for char in s: result.append("\%o" % ord(char)) return ''.join(result) Regards, Jordan -- http://mail.python.org/mailman/listinfo/python-list