On Mar 29, 9:42 am, Shane Geiger <[EMAIL PROTECTED]> wrote: > It would be worth learning pyparsing to do this. >
Thanks to Shane and Steven for the ref to pyparsing. I also was struck by this post, thinking "this is pyparsing written in re's and dicts". The approach you are taking is *very* much like the thought process I went through when first implementing pyparsing. I wanted to easily compose expressions from other expressions. In your case, you are string interpolating using a cumulative dict of prior expressions. Pyparsing uses various subclasses of the ParserElement class, with operator definitions for alternation ("|" or "^" depending on non- greedy vs. greedy), composition ("+"), and negation ("~"). Pyparsing also uses its own extended results construct, ParseResults, which supports named results fields, accessible using list indicies, dict names, or instance names. Here is the pyparsing treatment of your example (I may not have gotten every part correct, but my point is more the similarity of our approaches). Note the access to the smtp parameters via the Dict transformer. -- Paul from pyparsing import * # <dotnum> ::= <snum> "." <snum> "." <snum> "." <snum> intgr = Word(nums) dotnum = Combine(intgr + "." + intgr + "." + intgr + "." + intgr) # <dot-string> ::= <string> | <string> "." <dot-string> string_ = Word(alphanums) dotstring = Combine(delimitedList(string_,".")) # <domain> ::= <element> | <element> "." <domain> domain = dotnum | dotstring # <q> ::= any one of the 128 ASCII characters except <CR>, <LF>, quote ("), or backslash (\) # <x> ::= any one of the 128 ASCII characters (no exceptions) # <qtext> ::= "\" <x> | "\" <x> <qtext> | <q> | <q> <qtext> # <quoted-string> ::= """ <qtext> """ quotedString = dblQuotedString # <- just use pre-defined expr from pyparsing # <local-part> ::= <dot-string> | <quoted-string> localpart = (dotstring | quotedString).setResultsName("localpart") # <mailbox> ::= <local-part> "@" <domain> mailbox = Combine(localpart + "@" + domain).setResultsName("mailbox") # <path> ::= "<" [ <a-d-l> ":" ] <mailbox> ">" # also accept address without <> path = "<" + mailbox + ">" | mailbox # esmtp-keyword ::= (ALPHA / DIGIT) *(ALPHA / DIGIT / "-") esmtpkeyword = Word(alphanums,alphanums+"-") # esmtp-value ::= 1*<any CHAR excluding "=", SP, and all esmtpvalue = Regex(r'[^= \t\r\n\f\v]*') # ; syntax and values depend on esmtp-keyword # control characters (US ASCII 0-31inclusive)> # esmtp-parameter ::= esmtp-keyword ["=" esmtp-value] # esmtp-parameter ::= esmtp-keyword ["=" esmtp-value] esmtpparameters = Dict( ZeroOrMore( Group(esmtpkeyword + Suppress("=") + esmtpvalue) ) ) # esmtp-cmd ::= inner-esmtp-cmd [SP esmtp-parameters] CR LF esmtp_addr = path + \ Optional(esmtpparameters,default=[])\ .setResultsName("parameters") for t in tests: for keyword in [ 'MAIL FROM:', 'RCPT TO:' ]: keylen=len(keyword) if t[:keylen].upper()==keyword: t=t[keylen:] break try: match = esmtp_addr.parseString(t) print 'MATCH' print match.dump() # some sample code to access elements of the parameters "dict" if "SIZE" in match.parameters: print "SIZE is", match.parameters.SIZE print except ParseException,pe: print 'DONT match', t prints: MATCH ['<', ['[EMAIL PROTECTED]'], '>'] - mailbox: ['[EMAIL PROTECTED]'] - localpart: johnsmith - parameters: [] MATCH [['[EMAIL PROTECTED]']] - mailbox: ['[EMAIL PROTECTED]'] - localpart: johnsmith - parameters: [] MATCH ['<', ['[EMAIL PROTECTED]'], '>', ['SIZE', '1234'], ['OTHER', '[EMAIL PROTECTED]']] - OTHER: [EMAIL PROTECTED] - SIZE: 1234 - mailbox: ['[EMAIL PROTECTED]'] - localpart: johnsmith - parameters: [['SIZE', '1234'], ['OTHER', '[EMAIL PROTECTED]']] - OTHER: [EMAIL PROTECTED] - SIZE: 1234 SIZE is 1234 MATCH [['[EMAIL PROTECTED]'], ['SIZE', '1234'], ['OTHER', '[EMAIL PROTECTED]']] - OTHER: [EMAIL PROTECTED] - SIZE: 1234 - mailbox: ['[EMAIL PROTECTED]'] - localpart: johnsmith - parameters: [['SIZE', '1234'], ['OTHER', '[EMAIL PROTECTED]']] - OTHER: [EMAIL PROTECTED] - SIZE: 1234 SIZE is 1234 MATCH ['<', ['"[EMAIL PROTECTED]> legal=email"@addresscom'], '>'] - mailbox: ['"[EMAIL PROTECTED]> legal=email"@addresscom'] - localpart: "[EMAIL PROTECTED]> legal=email" - parameters: [] MATCH [['"[EMAIL PROTECTED]> legal=email"@addresscom']] - mailbox: ['"[EMAIL PROTECTED]> legal=email"@addresscom'] - localpart: "[EMAIL PROTECTED]> legal=email" - parameters: [] MATCH ['<', ['"[EMAIL PROTECTED]> legal=email"@addresscom'], '>', ['SIZE', '1234'], ['OTHER', '[EMAIL PROTECTED]']] - OTHER: [EMAIL PROTECTED] - SIZE: 1234 - mailbox: ['"[EMAIL PROTECTED]> legal=email"@addresscom'] - localpart: "[EMAIL PROTECTED]> legal=email" - parameters: [['SIZE', '1234'], ['OTHER', '[EMAIL PROTECTED]']] - OTHER: [EMAIL PROTECTED] - SIZE: 1234 SIZE is 1234 MATCH [['"[EMAIL PROTECTED]> legal=email"@addresscom'], ['SIZE', '1234'], ['OTHER', '[EMAIL PROTECTED]']] - OTHER: [EMAIL PROTECTED] - SIZE: 1234 - mailbox: ['"[EMAIL PROTECTED]> legal=email"@addresscom'] - localpart: "[EMAIL PROTECTED]> legal=email" - parameters: [['SIZE', '1234'], ['OTHER', '[EMAIL PROTECTED]']] - OTHER: [EMAIL PROTECTED] - SIZE: 1234 SIZE is 1234 -- http://mail.python.org/mailman/listinfo/python-list