CaptainMcCrank wrote: > I'm struggling with a problem analyzing large amounts of unicode data > in an http wireshark capture. > I've solved the problem with the interpreter, but I'm not sure how to > do this in an automated fashion. > > I'd like to grab a line from a text file & translate the unicode > sections of it to ascii. So, for example > I'd like to take > "\u003cb\u003eMar 17\u003c/b\u003e" > > and turn it into > > "<b>Mar 17</b>" > > I can handle this from the interpreter as follows: > >>>> import unicodedata >>>> mystring = u"\u003cb\u003eMar 17\u003c/b\u003e" >>>> print mystring > <b>Mar 17</b> >>>> > > But I don't know what I need to do to automate this! The data that is > in the quotes from line 2 will have to come from a variable. I am > unable to figure out how to do this using a variable rather than a > literal string.
If wireshark uses the same escape codes as python you can use str.decode() or open the file with codecs.open(): >>> s = "\u003cb\u003eMar 17\u003c/b\u003e" >>> s '\\u003cb\\u003eMar 17\\u003c/b\\u003e' >>> s.decode("unicode-escape") u'<b>Mar 17</b>' >>> open("tmp.txt", "w").write(s) >>> import codecs >>> f = codecs.open("tmp.txt", "r", encoding="unicode-escape") >>> f.read() u'<b>Mar 17</b>' Peter -- http://mail.python.org/mailman/listinfo/python-list