Proposal: Named RE variables ====================== The problem I have is that I am writing a 'good-enough' verilog tag extractor as a long regular expression (with the 'x' flag for readability), and find myself both 1) Repeating sections of the RE, and 2) Wanting to add '(?P<some_clarifier>...) ' around sections because I know what the section does but don't really want the group.
If I could write: (?P/verilog_name/ [A-Za-z_][A-Za-z_0-9\$\.]* | \\\S+ ) ...and have the RE parser extract the section of RE after the second '/' and store it associated with its name that appears between the first two '/'. The RE should NOT try and match against anything between the outer '(' ')' pair at this point, just store. Then the following code appearing later in the RE: (?P=verilog_name) ...should retrieve the RE snippet named and insert it into the RE instead of the '(?P=...)' group before interpreting the RE 'as normal' Instead of writing the following to search for event declarations: vlog_extract = r'''(?smx) # Verilog event definition extraction (?: event \s+ [A-Za-z_][A-Za-z_0-9\$\.]* \s* (?: , \s* [A-Za-z_][A-Za-z_0-9\$\.]*)* ) ''' I could write the following RE, which I think is clearer: vlog_extract = r'''(?smx) # Verilog identifier definition (?P/IDENT/ [A-Za-z_][A-Za-z_0-9\$\.]* (?!\.) ) # Verilog event definition extraction (?: event \s+ (?P=IDENT) \s* (?: , \s* (?P=IDENT))* ) ''' Extension; named RE variables, with arguments =================================== In this, all group definitions in the body of the variable definition reference the literal contents of groups appearing after the variable name, (but within the variable reference), when the variable is referenced So an RE variable definition like: defs = r'(?smx) (?P/GO/ go \s for \s \1 )' Used like: rgexp = defs + r""" (?P=GO (it) ) \s+ (?P=\GO (broke) ) """ Would match the string: "go for it go for broke" As would: defs2 = r'(?smx) (?P/GO/ go \s for \s (?P=subject) )' rgexp = defs2 + r""" (?P=GO (?P<subject> it) ) \s+ (?P=\GO (?P<subject> broke) ) """ The above would allow me to factor out sections of REs and define named, re-ussable RE snippets. Please comment :-) - Paddy. -- http://mail.python.org/mailman/listinfo/python-list