Dear John (cc python-list), >> You may find (1) that the file has formfeeds in >> it or (2) it has r"\f" in in it and you were mistaken about the >> interpretation or (3) something else. >> > ... > >> Thank you for the quick response. Ultimately I need to remap the "f" in >> "\f" to something else, so I worked around the problem by doing the >> remapping first, and I'm now getting the desired result. >> > > Please reply on-list. > > How could you read the file to remap an "f" if you were getting '\0x0C' > when you tried to read it? Are we to assume that it was case (2) i.e. > not a Python problem? > Possibly more than anyone else on-list cares to see, but it was case (3): I had misdiagnosed the input. The match was failing because I was reading the line improperly (when I remapped the "f" I was ... er ... inexplicably surprised when I couldn't find it, although it turned out to be there when I looked for the remapped value instead of for the original "f"). When I tried to troubleshoot it in an interpreter window, I misread the results, which is what prompted my inquiry on the list. Here's the intepreter diagnosis:
>>> string1 = "blah \fR40\fC blah" >>> string1 'blah \x0cR40\x0cC blah' >>> string2 = "blah \\fR40\\fC blah" >>> string2 'blah \\fR40\\fC blah' If I create a file that consists of: <?xml version="1.0" encoding="UTF-8"?> <test> <line>Hi, there</line> <line>blah \fR40\fC blah</line> <line>Hi, there</line> </test> And then a python script that reads: import codecs import re file = open("test1.xml", "r") nePat = re.compile("\\\\f.") for line in file: print line print nePat.sub("TEST", line) the relevant line comes out as: <line>blah TEST40TEST blah</line> which is what I want. That is, the script was, indeed, reading the character string correctly, as you suggested, and the substitution that I ran into during my test in the interpreter window was a red herring. Thanks again for the advice to look more closely. Best, David -- http://mail.python.org/mailman/listinfo/python-list