On Wed, 2010-11-17 at 21:12 -0800, Sorin Schwimmer wrote: > Thanks for your answers. > > Benjamin Kaplan: of course dict is a type... silly me! I'll blame it on the > time (it's midnight here). > > Chris Rebert: I'll have a look. > > Thank you both, > SxN > >
Forgive me if this is off the track. I haven't followed the thread. I do have a little module that I believe does what you attempted to do: multiple substitutions using a regular expression that joins a bunch of targets with '|' in between. Whether or not you risk unintended translations as Dave Angel pointed out where the two characters or one of your targets join coincidentally you will have to determine. If so you can't use this approach. If, on the other hand, your format is safe it'll work just fine. Use like this: >>> import translator >>> t = translator.Translator (nodia.items ()) >>> t (name) # Your example 'Rasca' Frederic
class Translator: """ Will translate any number of targets, handling them correctly if some overlap. Making Translator T = Translator (definitions, [eat = 1]) 'definitions' is a sequence of pairs: ((target, substitute),(t2, s2), ...) 'eat = True' will make an extraction filter that lets only the replaced targets pass. Definitions example: (('a','A'),('b','B'),('ab','ab'),('abc','xyz'), ('\x0c', 'page break'), ('\r\n','\n'), (' ','\t')) # ('ab','ab') see Tricks. Order doesn't matter. Testing T.test (). Translates the definitions and prints the result. All targets must look like the substitutes as defined. If a substitute differs, it has been affected by the translation. (E.g. 'A'|'A' ... 'page break'|'pAge BreAk'). If this is not intended---the effect can be useful---protect the affected substitute by translating it to itself. See Tricks. Running translation = T (source) Tricks Deletion: ('target', '') Exception: (('\n',''), ('\n\n','\n\n')) # Eat LF except paragraph breaks. Exception: (('\n', '\r\n'), ('\r\n',\r\n')) # Unix to DOS, would leave DOS unchanged Translation cascade: # Unwrap paragraphs, Unix or DOS, restoring inter-word space if missing, Mark_LF = Translator ((('\n','+LF+'),('\r\n','+LF+'),('\n\n','\n\n'),('\r\n\r\n','\r\n\r\n'))) # Pick positively identifiable mark for end of lines in either Unix or MS-DOS. Single_Space_Mark = Translator (((' +LF+', ' '),('+LF+', ' '),('-+LF+', ''))) no_lf_text = Single_Space_Mark (Mark_LF (text)) Translation cascade: reptiles = T_latin_english (T_german_latin (reptilien)) Limitations 1. The number of substitutions and the maximum size of input depends on the respective capabilities of the Python re module. 2. Regular expressions will not work as such but will be handled literally. Author: Frederic Rentsch (i...@anthra-norell.ch). """ def __init__ (self, definitions, eat = 0): ''' definitions: a sequence of pairs of strings. ((target, substitute), (t, s), ...) eat: False (0) means translate: unaffected data passes unaltered. True (1) means extract: unaffected data doesn't pass (gets eaten). Extraction filters typically require substitutes to end with some separator, else they fuse together. (E.g. ' ', '\t' or '\n') 'eat' is an attribute that can be switched anytime. ''' self.eat = eat self.compile_sequence_of_pairs (definitions) def compile_sequence_of_pairs (self, definitions): ''' Argument 'definitions' is a sequence of pairs: (('target 1', 'substitute 1'), ('t2', 's2'), ...) Order doesn't matter. ''' import re self.definitions = definitions targets, substitutes = zip (*definitions) re_targets = [re.escape (item) for item in targets] re_targets.sort (reverse = True) self.targets_set = set (targets) self.table = dict (definitions) regex_string = '|'.join (re_targets) self.regex = re.compile (regex_string, re.DOTALL) def __call__ (self, s): hits = self.regex.findall (s) nohits = self.regex.split (s) valid_hits = set (hits) & self.targets_set # Ignore targets with illegal re modifiers. if valid_hits: substitutes = [self.table [item] for item in hits if item in valid_hits] + [] # Make lengths equal for zip to work right if self.eat: return ''.join (substitutes) else: zipped = zip (nohits, substitutes) return ''.join (list (reduce (lambda a, b: a + b, [zipped][0]))) + nohits [-1] else: if self.eat: return '' else: return s def test (self): ''' Translates the definitions and prints the result. All targets must look like the substitutes as defined. If a substitute differs, it has been affected by the translation, indicating a potential problem, should the substitute occur in the source. ''' targets_translated = [self (item [0]) for item in self.definitions] substitutes = [self (item [1]) for item in self.definitions] for item in [(repr (targets_translated [n]), repr (substitutes [n])) for n in range (len (substitutes))]: print '%s|%s' % (item)
-- http://mail.python.org/mailman/listinfo/python-list