On Wed, Sep 2, 2015 at 12:03 PM, Rob Hills <rhi...@medimorphosis.com.au> wrote: > My mapping file contents look like this: > > \r = \\n > “ = "
Oh, lovely. Code page 1252 when you're expecting UTF-8. Sadly, you're likely to have to cope with a whole pile of other mojibake if that happens :( You have my sympathy. > < = < > > = > > ' = ' > F = F > o = o > f = f > e = e > O = O > > This all works "as advertised" except for the '\r' => '\\n' replacement. > Debugging the code, I see that my '\r' character is "escaped" to '\\r' and > the '\\n' to '\\\\n' when they are read in from the file. Technically, what's happening is that your "\r" is literally a backslash followed by the letter r; the transformation of backslash sequences into single characters is part of Python source code parsing. (Incidentally, why do you want to change a carriage return into backslash-n? Seems odd.) Probably the easiest solution would be a simple and naive replace(), looking for some very specific strings and ignoring everything else. Easy to do, but potentially confusing down the track if someone tries something fancy :) line = line.split('#')[:1][0].strip() # trim any trailing comments line = line.replace(r"\r", "\r") # repeat this for as many backslash escapes as you want to handle Be aware that this, while simple, is NOT capable of handling escaped backslashes. In Python, "\\r" comes out the same as r"\r", but with this parser, it would come out the same as "\\\r". But it might be sufficient for you. ChrisA -- https://mail.python.org/mailman/listinfo/python-list