On Mar 23, 4:16 pm, Peter Otten <__pete...@web.de> wrote: > CaptainMcCrank wrote: > > I'm struggling with a problem analyzing large amounts of unicode data > > in an http wireshark capture. > > I've solved the problem with the interpreter, but I'm not sure how to > > do this in an automated fashion. > > > I'd like to grab a line from a text file & translate the unicode > > sections of it to ascii. So, for example > > I'd like to take > > "\u003cb\u003eMar 17\u003c/b\u003e" > > > and turn it into > > > "<b>Mar 17</b>" > > > I can handle this from the interpreter as follows: > > >>>> import unicodedata > >>>> mystring = u"\u003cb\u003eMar 17\u003c/b\u003e" > >>>> print mystring > > <b>Mar 17</b> > > > But I don't know what I need to do to automate this! The data that is > > in the quotes from line 2 will have to come from a variable. I am > > unable to figure out how to do this using a variable rather than a > > literal string. > > If wireshark uses the same escape codes as python you can use str.decode() > or open the file with codecs.open(): > > >>> s = "\u003cb\u003eMar 17\u003c/b\u003e" > >>> s > > '\\u003cb\\u003eMar 17\\u003c/b\\u003e'>>> s.decode("unicode-escape") > > u'<b>Mar 17</b>' > > >>> open("tmp.txt", "w").write(s) > >>> import codecs > >>> f = codecs.open("tmp.txt", "r", encoding="unicode-escape") > >>> f.read() > > u'<b>Mar 17</b>' > > Peter
This is a workable solution! Thank you Peter! -- http://mail.python.org/mailman/listinfo/python-list