Hi, On 03/09/15 06:31, MRAB wrote: > On 2015-09-02 03:03, Rob Hills wrote: >> I am developing code (Python 3.4) that transforms text data from one >> format to another. >> >> As part of the process, I had a set of hard-coded str.replace(...) >> functions that I used to clean up the incoming text into the desired >> output format, something like this: >> >> dataIn = dataIn.replace('\r', '\\n') # Tidy up linefeeds >> dataIn = dataIn.replace('<','<') # Tidy up < character >> dataIn = dataIn.replace('>','>') # Tidy up < character >> dataIn = dataIn.replace('o','o') # No idea why but lots of >> these: convert to 'o' character >> dataIn = dataIn.replace('f','f') # .. and these: convert to >> 'f' character >> dataIn = dataIn.replace('e','e') # .. 'e' >> dataIn = dataIn.replace('O','O') # .. 'O' >> > The problem with this approach is that the order of the replacements > matters. For example, changing '<' to '<' and then '&' to '&' > can give a different result to changing '&' to '&' and then '<' > to '<'. If you started with the string '&lt;', then the first order > would go '&lt;' => '&lt;' => '<', whereas the second order > would go '&lt;' => '<' => '<'.
Ah yes, thanks for reminding me about that. I've since modified my code to use a collections.OrderedDict to store my mappings. ... >> This all works "as advertised" */except/* for the '\r' => '\\n' >> replacement. Debugging the code, I see that my '\r' character is >> "escaped" to '\\r' and the '\\n' to '\\\\n' when they are read in from >> the file. >> >> I've been googling hard and reading the Python docs, trying to get my >> head around character encoding, but I just can't figure out how to get >> these bits of code to do what I want. >> >> It seems to me that I need to either: >> >> * change the way I represent '\r' and '\\n' in my mapping file; or >> * transform them somehow when I read them in >> >> However, I haven't figured out how to do either of these. >> > Try ast.literal_eval, although you'd need to make it look like a string > literal first: Thanks for the suggestion. For now, I've decided I was being too pedantic trying to load my two escaped strings from a file and I've simply hard coded them and moved on to other issues. I'll try this idea later on though. Cheers, -- Rob Hills Waikiki, Western Australia -- https://mail.python.org/mailman/listinfo/python-list