2010/8/25 Jed <jedmelt...@gmail.com>: > Hi, I'm seeking help with a fairly simple string processing task. > I've simplified what I'm actually doing into a hypothetical > equivalent. > Suppose I want to take a word in Spanish, and divide it into > individual letters. The problem is that there are a few 2-character > combinations that are considered single letters in Spanish - for > example 'ch', 'll', 'rr'. > Suppose I have: > > alphabet = ['a','b','c','ch','d','u','r','rr','o'] #this would include > the whole alphabet but I shortened it here > theword = 'churro' > > I would like to split the string 'churro' into a list containing: > > 'ch','u','rr','o' > > So at each letter I want to look ahead and see if it can be combined > with the next letter to make a single 'letter' of the Spanish > alphabet. I think this could be done with a regular expression > passing the list called "alphabet" to re.match() for example, but I'm > not sure how to use the contents of a whole list as a search string in > a regular expression, or if it's even possible. My real application > is a bit more complex than the Spanish alphabet so I'm looking for a > fairly general solution. > Thanks, > Jed > -- > http://mail.python.org/mailman/listinfo/python-list >
Hi, I am not sure, whether it can be generalised enough for your needs, but you can try something like >>> re.findall(r"rr|ll|ch|[a-z]", "asdasdallasdrrcvb") ['a', 's', 'd', 'a', 's', 'd', 'a', 'll', 'a', 's', 'd', 'rr', 'c', 'v', 'b'] of course, the pattern should be adjusted precisely in order not to loose characters... hth, vbr -- http://mail.python.org/mailman/listinfo/python-list