Re: parsing question

Tim Chase Mon, 31 May 2010 08:11:59 -0700

On 05/31/2010 08:42 AM, Mag Gam wrote:

I have a file with bunch of nfsstat -c (on AIX) which has all the
hostnames, for example

...

Is there a an easy way to parse this file according to each host?


So,
r1svr.Connectionless.calls=6553
r1svr.Connectionless.badcalls=0

and so on...


I am currently using awk which I am able to get what I need, but
curious if in python how people handle block data.

Since you already profess to having an awk solution, I felt itwas okay to at least take a stab at my implementation (ratherthan doing your job for you :). Without a complete spec for theoutput, it's a bit of guesswork, but I got something fairly closeto what you want. It uses nested dictionaries which mean thekeys and values have to be referenced like


  servers["r1svr"]["connectionless"]["calls"]

and the values are strings (I'm not sure what you want in thecase of the data that has both a value and percentage) notints/floats/percentages/etc.


That said, this should get you fairly close to what you describe:

###########################################

import re
header_finding_re = re.compile(r'\b\w{2,}')
version_re = re.compile(r'^Version (\d+):\s*\(.*\)$', re.I)
CLIENT_HEADER = 'Client '
CONNECTION_HEADER = 'Connection'
servers = {}
server = client = orig_client = subtype = None
source = file('data.txt')
for line in source:
  line = line.rstrip('\r\n')
  if not line.strip(): continue
  if line.startswith('='*5) and line.endswith('='*5):
    server = line.strip('=')
    client = orig_client = subtype = None
  elif line.startswith(CLIENT_HEADER):
    orig_client = client = line[len(CLIENT_HEADER):-1]
    subtype = 'all'
  elif line.startswith(CONNECTION_HEADER):
    subtype = line.replace(' ', '').lower()
  else: # it's a version or header row
    m = version_re.match(line)
    if m:
      subtype = "v" + m.group(1)
    else:
      if None in (server, client, subtype):
        print "Missing data", repr((server, client, subtype))
        continue
      dest = servers.setdefault(server, {}
        ).setdefault(client, {}
        ).setdefault(subtype, {})
      data = source.next()
      row = header_finding_re.finditer(line)
      prev = row.next()
      for header in row:
        key = prev.group(0)
        value = data[prev.start():header.start()].strip()
        prev = header
        dest[key] = value
      key = prev.group(0)
      value = data[prev.start():].strip()
      dest[key] = value

for server, clients in servers.items():
  for client, subtypes in clients.items():
    for subtype, kv in subtypes.items():
      for key, value in kv.items():
        print ".".join([server, client, subtype, key]),
        print '=', value

###########################################

Have fun,

-tkc




--
http://mail.python.org/mailman/listinfo/python-list

Re: parsing question

Reply via email to