Paul McGuire wrote: > "Paddy" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] > > Proposal: Named RE variables > > ======================
Hi Paul, please also refer to my reply to John. > > By contrast, the event declaration expression in the pyparsing Verilog > parser is: > > identLead = alphas+"$_" > identBody = alphanums+"$_" > #~ identifier = Combine( Optional(".") + > #~ delimitedList( Word(identLead, identBody), ".", > combine=True ) ).setName("baseIdent") > # replace pyparsing composition with Regex - improves performance ~10% for > this construct > identifier = Regex( > r"\.?["+identLead+"]["+identBody+"]*(\.["+identLead+"]["+identBody+"]*)*" ). > setName("baseIdent") > > eventDecl = Group( "event" + delimitedList( identifier ) + semi ) > I have had years of success by writing RE's to extract what I am interested in, not react to what I'm not interested in, and maybe make slight mods down the line as examples crop up that break the program. I do rely on what examples I get to test my extractors, but I find examples a lot easier to come by than the funds/time for a language parser. Since I tend to stay in a job for a number of years, I know that the method works, and gives quick results that rapidly become dependable as I am their to catch any flak ;-). It's difficult to switch to parsers for me even though examples like pyparsing seem readable, I do want to skip what I am not interested in rather than having to write a parser for everything. But converely, when something skipped does bite me - I want to be able to easily add it in. Are their any examples of this kind of working with parsers? > > But why do you need an update to RE to compose snippets? Especially > snippets that you can only use in the same RE? Just do string interp: > > > I could write the following RE, which I think is clearer: > > vlog_extract = r'''(?smx) > > # Verilog identifier definition > > (?P/IDENT/ [A-Za-z_][A-Za-z_0-9\$\.]* (?!\.) ) > > # Verilog event definition extraction > > (?: event \s+ (?P=IDENT) \s* (?: , \s* (?P=IDENT))* ) > > ''' > IDENT = "[A-Za-z_][A-Za-z_0-9\$\.]* (?!\.)" > vlog_extract = r'''(?smx) > # Verilog event definition extraction > (?: event \s+ %(IDENT)s \s* (?: , \s* %(IDENT)s)* ) > ''' % locals() > > Yuk, this is a mess - which '%' signs are part of RE and which are for > string interp? Maybe just plain old string concat is better: Yeah, I too thought that the % thing was ugly when used on an RE. > > IDENT = "[A-Za-z_][A-Za-z_0-9\$\.]* (?!\.)" > vlog_extract = r'''(?smx) > # Verilog event definition extraction > (?: event \s+ ''' + IDENT + ''' \s* (?: , \s* ''' + IDENT + ''')* )''' ... And the string concats broke up the visual flow of my multi-line RE. > > By the way, your IDENT is not totally accurate - it does not permit a > leading ".", and it does permit leading digits in identifier elements after > the first ".". So ".goForIt" would not be matched as a valid identifier > when it should, and "go.4it" would be matched as valid when it shouldn't (at > least as far as I read the Verilog grammar). Thanks for the info on IDENT. I am not working with the grammer spec in front of me, and I know I will have to revisit my RE. you've saved me some time! > > (Pyparsing (http://sourceforge.net/projects/pyparsing/) is open source under > the MIT license. The Verilog grammar is not distributed with pyparsing, and > is only available free for noncommercial use.) > > -- Paul - Paddy. -- http://mail.python.org/mailman/listinfo/python-list