>>>>> hubritic <colinland...@gmail.com> (h) wrote: >h> I want to parse a log that has entries like this: >h> [2009-03-17 07:28:05.545476 -0500] rprt s=d2bpr80d6 m=2 mod=mail >h> cmd=msg module=access rule=x_dynamic_ip action=discard attachments=0 >h> rcpts=1 >h> >routes=DL_UK_ALL,NOT_DL_UK_ALL,default_inbound,firewallsafe,mail01_mail02,spfsafe >h> size=4363 guid=291f0f108fd3a6e73a11f96f4fb9e4cd hdr_mid= >h> qid=n2HCS4ks025832 subject="I want to interview you" duration=0.236 >h> elapsed=0.280
>h> the keywords will not always be the same. Also differing log levels >h> will provide a different mix of keywords. >h> This is good enough to get the majority of cases where there is a >h> keyword, a "=" and then a value with no spaces: >h> Group(Word(alphas + "+_-.").setResultsName("keyword") + Suppress >h> (Literal ("=")) + Optional(Word(printables))) >h> Sometimes there is a subject, which is a quoted string. That is easy >h> enough to get with this: >h> dblQuotedString(ZeroOrMore(Word(printables) ) ) >h> My problem is combining them into one expression. Either I wind up >h> with just the subject or I wind up with they keywords and their >h> values, one of which is: >h> subject, '"I' >h> which is clearly not what I want. >h> Do I scan each line twice, first looking for quotes ? Use the MatchFirst (|) I have also split it up to make it more readable kw = Word(alphas + "+_-.").setResultsName("keyword") eq = Suppress(Literal ("=")) value = dblQuotedString | Optional(Word(printables)) pattern = Group(kw + eq + value) -- Piet van Oostrum <p...@cs.uu.nl> URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: p...@vanoostrum.org -- http://mail.python.org/mailman/listinfo/python-list