Hi, I'm hoping someone can help me. I'm hopelessly lost.
I'm trying to make a change in some XML files using a regular expression (re.sub). I can capture the text I want to replace OK but when I replace it end up with nothing: i.e., just a "" character in my file. data = re.sub(r'(?i)(?u)<title><emph typestyle=\"bf\">Sample Title</emph></title><para indent=\"none\" runin=\"1\"><emph typestyle=\"bf\">\—(.*?):</emph>', '<title><icon name="graphic"/> <emph typestyle="bf">Sample Title—\1:</emph></title><para indent="none" runin="1">', data) I think my problem is that I don't understand unicode or even know how my XML is encoded b/c there is nothing in the XML declaration at the top of the file. I'd be grateful if someone could give a little adive or point me in the right direction. I've read abunch of stuff on the board but nothing seems to click.I'm guessing I have to decode my file when I read it something like this raw = inputFile.read() fileencoding = "utf-8" data = raw.decode(fileencoding) and then write it out similarly but this doesn't seem to work. Any help appreciated, Greg -- http://mail.python.org/mailman/listinfo/python-list