Hello, (new here)
Below an extension to standard module re. The point is to allow writing and testing sub-expressions individually, then nest them into a super-expression. More or less like using a parser generator -- but keeping regex grammar and power. I used the format {sub_expr_name}: as in standard regexes {} are only used to express repetition number, a pair of curly braces nesting an identifier should not conflict. The extension is new, very few tested. I would enjoy comments, critics, etc. I would like to know if you find such a feature useful. You will probably find the code simple enough ;-) Denis ------ la vida e estranya =============== # coding: utf-8 ''' super_regex Define & check sub-patterns individually, then include them in global super-pattern. uses format {name} for inclusion: sub1 = Regex(...) sub2 = Regex(...) super_format = "...{sub1}...{sub2}..." # final regex object: super_regex = superRegex(super_format) ''' from re import compile as Regex # sub-pattern inclusion format sub_pattern = Regex(r"{[a-zA-Z_][a-zA-Z_0-9]*}") # sub-pattern expander def sub_pattern_expansion(inclusion, dic=None): name = inclusion.group()[1:-1] ### namespace dict may be specified -- else globals() if dic is None: dic = globals() if name not in dic: raise NameError("Cannot find sub-pattern '%s'." % name) return dic[name].pattern # super-pattern generator def superRegex(format): expanded_format = sub_pattern.sub(sub_pattern_expansion, format) return Regex(expanded_format) if __name__ == "__main__": # purely artificial example use # pattern time = Regex(r"\d\d:\d\d:\d\d") # hh:mm:ss code = Regex(r"\S{5}") # non-whitespace x 5 desc = Regex(r"[\w\s]+$") # alphanum|space --> EOL ref_format = "^ref: {time} #{code} --- {desc}" ref_regex = superRegex(ref_format) # output print 'super pattern:\n"%s" ==>\n"%s"\n' % (ref_format,ref_regex.pattern) text = "ref: 12:04:59 #%+.?% --- foo 987 bar" result = ref_regex.match(text) print 'text: "%s" ==>\n"%s"' %(text,result.group()) -- http://mail.python.org/mailman/listinfo/python-list