Hi Chris, On 03/09/15 06:10, Chris Angelico wrote: > On Wed, Sep 2, 2015 at 12:03 PM, Rob Hills <rhi...@medimorphosis.com.au> > wrote: >> My mapping file contents look like this: >> >> \r = \\n >> “ = " > Oh, lovely. Code page 1252 when you're expecting UTF-8. Sadly, you're > likely to have to cope with a whole pile of other mojibake if that > happens :(
Yeah, tell me about it!!! > Technically, what's happening is that your "\r" is literally a > backslash followed by the letter r; the transformation of backslash > sequences into single characters is part of Python source code > parsing. (Incidentally, why do you want to change a carriage return > into backslash-n? Seems odd.) > > Probably the easiest solution would be a simple and naive replace(), > looking for some very specific strings and ignoring everything else. > Easy to do, but potentially confusing down the track if someone tries > something fancy :) > > line = line.split('#')[:1][0].strip() # trim any trailing comments > line = line.replace(r"\r", "\r") # repeat this for as many backslash > escapes as you want to handle > > Be aware that this, while simple, is NOT capable of handling escaped > backslashes. In Python, "\\r" comes out the same as r"\r", but with > this parser, it would come out the same as "\\\r". But it might be > sufficient for you. Thanks for the explanation which has helped me understand the problem. I also tried your approach but wound up with output data that somehow had every single character escaped :-( I've since decided I was being too obsessive trying to load *everything* from my mapping file and have simply hard-coded my two escaped character replacements for now and moved on to more important problems (ie the Windoze Character soup that comprises my data and which I have to clean up!). Thanks again, -- Rob Hills Waikiki, Western Australia -- https://mail.python.org/mailman/listinfo/python-list