Hi, I am developing code (Python 3.4) that transforms text data from one format to another.
As part of the process, I had a set of hard-coded str.replace(...) functions that I used to clean up the incoming text into the desired output format, something like this: dataIn = dataIn.replace('\r', '\\n') # Tidy up linefeeds dataIn = dataIn.replace('<','<') # Tidy up < character dataIn = dataIn.replace('>','>') # Tidy up < character dataIn = dataIn.replace('o','o') # No idea why but lots of these: convert to 'o' character dataIn = dataIn.replace('f','f') # .. and these: convert to 'f' character dataIn = dataIn.replace('e','e') # .. 'e' dataIn = dataIn.replace('O','O') # .. 'O' These statements transform my data correctly, but the list of statements grows as I test the data so I thought it made sense to store the replacement mappings in a file, read them into a dict and loop through that to do the cleaning up, like this: with open(fileName, 'r+t', encoding='utf-8') as mapFile: for line in mapFile: line = line.strip() try: if (line) and not line.startswith('#'): line = line.split('#')[:1][0].strip() # trim any trailing comments name, value = line.split('=') name = name.strip() self.filterMap[name]=value.strip() except: self.logger.error('exception occurred parsing line [{0}] in file [{1}]'.format(line, fileName)) raise Elsewhere, I use the following code to do the actual cleaning up: def filter(self, dataIn): if dataIn: for token, replacement in self.filterMap.items(): dataIn = dataIn.replace(token, replacement) return dataIn My mapping file contents look like this: \r = \\n â = " < = < > = > ' = ' F = F o = o f = f e = e O = O This all works "as advertised" */except/* for the '\r' => '\\n' replacement. Debugging the code, I see that my '\r' character is "escaped" to '\\r' and the '\\n' to '\\\\n' when they are read in from the file. I've been googling hard and reading the Python docs, trying to get my head around character encoding, but I just can't figure out how to get these bits of code to do what I want. It seems to me that I need to either: * change the way I represent '\r' and '\\n' in my mapping file; or * transform them somehow when I read them in However, I haven't figured out how to do either of these. TIA, -- Rob Hills Waikiki, Western Australia
-- https://mail.python.org/mailman/listinfo/python-list