>> when I replace it end up with nothing: i.e., just a "" character in my >> file.
how are you viewing the contents of your file? are you printing it out to stdout? are you opening your file in a non-unicode aware editor? try print repr(data) after re.sub so that you see what you actually have in data btw, from where did you get you XML files? -- http://mail.python.org/mailman/listinfo/python-list