On Mon, 29 Oct 2007 05:45:26 -0700, Paul McGuire <[EMAIL PROTECTED]> wrote:
>On Oct 29, 1:11 am, avidfan <[EMAIL PROTECTED]> wrote: >> Help with pyparsing and dealing with null values >> >> I am trying to parse a log file (web.out) similar to this: >> >> ----------------------------------------------------------- >> >> MBeanName: "mtg-model:Name=mtg-model_managed2,Type=Server" >> AcceptBacklog: 50 ><snip> >> ExpectedToRun: false >> ExternalDNSName: >> ExtraEjbcOptions: >> ExtraRmicOptions: >> GracefulShutdownTimeout: 0 >> >> ----------------------------------------------------------- >> >> and I need the indented values (eventually) in a dictionary. As you >> can see, some of the fields have a value, and some do not. It appears >> that the code I have so far is not dealing with the null values and >> colons as I had planned. >> > >This is a very good first cut at the problem. Here are some tips to >get you going again: > >1. Literal("\n") wont work, use LineEnd() instead. Literals are for >non-whitespace literal strings. > > >2. "all = SkipTo(end)" can be removed, use restOfLine instead of all. >("all" as a variable name masks Python 2.5's "all" builtin function.) > > >3. In addition to identity, you might consider defining some other >known value types: > >boolean = oneOf("true false") >boolean.setParseAction(lambda toks: toks[0]=="true") > >integer = Combine(Optional("-") + Word(nums)) >integer.setParseAction(lambda toks: int(toks[0])) > >These will do data conversion for you at parse time, so that the >values are already in int or bool form when you access them later. > > >4. The significant change is to this line (I've replaced all with >restOfLine): > >pairs = Group(identity + colon + Optional(identity) + restOfLine) > >What gives us a problem is that pyparsing's whitespace-skipping will >read an identity, even if it's not on the same line. So for keys that >have no value given, you end up reading past the end-of-line and read >the next key name as the value for the previous key. To work around >this, define the value as something which must be on the same line, >using the NotAny lookahead, which you can abbreviate using the ~ >operator. > >pairs = Group(identity + colon + Optional(~end + (identity | >restOfLine) ) + end ) > >If we add in the other known value types, this gets a bit unwieldy, so >I recommend you define value separately: > >value = boolean | integer | identity | restOfLine >pairs = Group(identity + colon + Optional(~end + value) + end ) > >At this point, I think you have a working parser for your log data. > > >5. (Extra Credit) Lastly, to create a dictionary, you are all set to >just add pyparsing's Dict class. Change: > >logEntry = MBeanName + ServerName("servername") + OneOrMore(pairs) > >to: > >logEntry = MBeanName + ServerName("servername") + >Dict(OneOrMore(pairs)) > >(I've also removed ".setResultsName", using the new shortened form for >setting results names.) > >Dict will return the parsed tokens as-is, but it will also define >results names using the tokens[0] element of each list of tokens >returned by pairs - the values will be the tokens[1:], so that if a >value expression contains multiple tokens, they all will be associated >with the results name key. > >Now you can replace the results listing code with: > > for t in tokens: > print t > >with > > print tokens.dump() > >And you can access the tokens as if they are a dict, using: > > print tokens.keys() > print tokens.values() > print tokens["ClasspathServletDisabled"] > >If you prefer, for keys that are valid Python identifiers (all of >yours appear to be), you can just use object.attribute notation: > > print tokens.ClasspathServletDisabled > >Here is some sample output, using dump(), keys(), and attribute >lookup: > >tokens.dump() -> ['MBeanName:', '"mtg-model:Name=mtg- >model_managed2,Type=Server"', ['AcceptBacklog', 50], >['AdministrationPort', 0], ['AutoKillIfFailed', False], >['AutoRestart', True], ['COM', 'mtg-model_managed2'], ['COMEnabled', >False], ['CachingDisabled', True], ['ClasspathServletDisabled', >False], ['ClientCertProxyEnabled', False], ['Cluster', 'mtg-model- >cluster'], ['ClusterRuntime', 'mtg-model-cluster'], ['ClusterWeight', >100], ['CompleteCOMMessageTimeout', -1], >['CompleteHTTPMessageTimeout', -1], ['CompleteIIOPMessageTimeout', >-1], ['CompleteMessageTimeout', 60], ['CompleteT3MessageTimeout', -1], >['CustomIdentityKeyStoreFileName'], >['CustomIdentityKeyStorePassPhrase'], >['CustomIdentityKeyStorePassPhraseEncrypted'], >['CustomIdentityKeyStoreType'], ['CustomTrustKeyStoreFileName'], >['CustomTrustKeyStorePassPhrase'], >['CustomTrustKeyStorePassPhraseEncrypted'], >['CustomTrustKeyStoreType'], ['DefaultIIOPPassword'], >['DefaultIIOPPasswordEncrypted'], ['DefaultIIOPUser'], >['DefaultInternalServletsDisabled', False], ['DefaultProtocol', 't3'], >['DefaultSecureProtocol', 't3s'], ['DefaultTGIOPPassword'], >['DefaultTGIOPPasswordEncrypted', ' ****** '], ['DefaultTGIOPUser', >'guest'], ['DomainLogFilter'], ['EnabledForDomainLog', True], >['ExecuteQueues', 'weblogic.kernel.Default,foglight'], >['ExpectedToRun', False], ['ExternalDNSName'], ['ExtraEjbcOptions'], >['ExtraRmicOptions'], ['GracefulShutdownTimeout', 0]] >- AcceptBacklog: 50 >- AdministrationPort: 0 >- AutoKillIfFailed: False >- AutoRestart: True >- COM: mtg-model_managed2 >- COMEnabled: False >- CachingDisabled: True >- ClasspathServletDisabled: False >- ClientCertProxyEnabled: False >- Cluster: mtg-model-cluster >- ClusterRuntime: mtg-model-cluster >- ClusterWeight: 100 >- CompleteCOMMessageTimeout: -1 >- CompleteHTTPMessageTimeout: -1 >- CompleteIIOPMessageTimeout: -1 >- CompleteMessageTimeout: 60 >- CompleteT3MessageTimeout: -1 >- CustomIdentityKeyStoreFileName: >- CustomIdentityKeyStorePassPhrase: >- CustomIdentityKeyStorePassPhraseEncrypted: >- CustomIdentityKeyStoreType: >- CustomTrustKeyStoreFileName: >- CustomTrustKeyStorePassPhrase: >- CustomTrustKeyStorePassPhraseEncrypted: >- CustomTrustKeyStoreType: >- DefaultIIOPPassword: >- DefaultIIOPPasswordEncrypted: >- DefaultIIOPUser: >- DefaultInternalServletsDisabled: False >- DefaultProtocol: t3 >- DefaultSecureProtocol: t3s >- DefaultTGIOPPassword: >- DefaultTGIOPPasswordEncrypted: ****** >- DefaultTGIOPUser: guest >- DomainLogFilter: >- EnabledForDomainLog: True >- ExecuteQueues: weblogic.kernel.Default,foglight >- ExpectedToRun: False >- ExternalDNSName: >- ExtraEjbcOptions: >- ExtraRmicOptions: >- GracefulShutdownTimeout: 0 >- servername: "mtg-model:Name=mtg-model_managed2,Type=Server" > >tokens.keys() -> ['ClasspathServletDisabled', 'servername', >'ExternalDNSName', 'CustomTrustKeyStoreFileName', 'DefaultIIOPUser', >'ExpectedToRun', 'CachingDisabled', 'CompleteHTTPMessageTimeout', >'CompleteIIOPMessageTimeout', 'AutoKillIfFailed', >'ClientCertProxyEnabled', 'ExtraEjbcOptions', >'CustomTrustKeyStorePassPhraseEncrypted', 'COM', >'CompleteMessageTimeout', 'CustomIdentityKeyStoreType', >'CustomTrustKeyStoreType', 'EnabledForDomainLog', 'AutoRestart', >'DefaultTGIOPPasswordEncrypted', 'CompleteCOMMessageTimeout', >'DefaultInternalServletsDisabled', 'DefaultProtocol', 'ClusterWeight', >'ExecuteQueues', 'ExtraRmicOptions', 'CompleteT3MessageTimeout', >'DefaultTGIOPUser', 'AcceptBacklog', 'DefaultIIOPPassword', >'DefaultSecureProtocol', 'COMEnabled', >'CustomIdentityKeyStoreFileName', 'DefaultTGIOPPassword', >'CustomIdentityKeyStorePassPhraseEncrypted', >'GracefulShutdownTimeout', 'DefaultIIOPPasswordEncrypted', >'CustomIdentityKeyStorePassPhrase', 'ClusterRuntime', 'Cluster', >'DomainLogFilter', 'CustomTrustKeyStorePassPhrase', >'AdministrationPort'] > >tokens.ClasspathServletDisabled -> False > > >Cheers, >-- Paul > Thanks, Paul! That's exactly what I needed! -- http://mail.python.org/mailman/listinfo/python-list