Wow, this was harder than I thought (at least for a rusty Pythoneer
like myself). Here's my stab at an implementation. Remember, the
goal is to add a "match" method to Template which works like
Template.substitute, but in reverse: given a string, if that string
matches the template, then it should return a dictionary mapping each
template field to the corresponding value in the given string.
Oh, and as one extra feature, I want to support a ".greedy" attribute
on the Template object, which determines whether the matching of
fields should be done in a greedy or non-greedy manner.
------------------------------------------------------------
#!/usr/bin/python
from string import Template
import re
def templateMatch(self, s):
# start by finding the fields in our template, and building a map
# from field position (index) to field name.
posToName = {}
pos = 1
for item in self.pattern.findall(self.template):
# each item is a tuple where item 1 is the field name
posToName[pos] = item[1]
pos += 1
# determine if we should match greedy or non-greedy
greedy = False
if self.__dict__.has_key('greedy'):
greedy = self.greedy
# now, build a regex pattern to compare against s
# (taking care to escape any characters in our template that
# would have special meaning in regex)
pat = self.template.replace('.', '\\.')
pat = pat.replace('(', '\\(')
pat = pat.replace(')', '\\)') # there must be a better way...
if greedy:
pat = self.pattern.sub('(.*)', pat)
else:
pat = self.pattern.sub('(.*?)', pat)
p = re.compile(pat)
# try to match this to the given string
match = p.match(s)
if match is None: return None
out = {}
for i in posToName.keys():
out[posToName[i]] = match.group(i)
return out
Template.match = templateMatch
t = Template("The $object in $location falls mainly in the $subloc.")
print t.match( "The rain in Spain falls mainly in the train." )
------------------------------------------------------------
This sort-of works, but it won't properly handle $$ in the template,
and I'm not too sure whether it handles the ${fieldname} form,
either. Also, it only escapes '.', '(', and ')' in the template...
there must be a better way of escaping all characters that have
special meaning to RegEx, except for '$' (which is why I can't use
re.escape).
Probably the rest of the code could be improved too. I'm eager to
hear your feedback.
Thanks,
- Joe
--
http://mail.python.org/mailman/listinfo/python-list