The challenge is to turn a string like this:

    a=1,b="0234,)#($)@", k="7"

into this:

    [("a", "1"), ("b", "0234,)#($)#"), ("k", "7")]

A couple solutions "work" for various pathological cases of input data:

  import re
  s = 'a=1,b="0234,)#($)@", k="7"'
  r = re.compile(r"""
    (?P<varname>\w+)
    \s*=\s*(?:
    "(?P<quoted>[^"]*)"
    |
    (?P<unquoted>[^,]+)
    )
    """, re.VERBOSE)
  results = [
    (m.group('varname'),
      m.group('quoted') or
      m.group('unquoted')
    )
    for m in r.finditer(s)
    ]

############### or ##############################

  r = re.compile(r"""
    (\w+)
    \s*=\s*(
    "(?:[^"]*)"
    |
    [^,]+
    )
    """, re.VERBOSE)
  results = [
    (m.group(1), m.group(2).strip('"'))
    for m in r.finditer(s)
    ]

Things like internal quoting ('b="123\"456", c="123""456"') would require a slightly smarter parser.

-tkc




--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to