The challenge is to turn a string like this:
a=1,b="0234,)#($)@", k="7"
into this:
[("a", "1"), ("b", "0234,)#($)#"), ("k", "7")]
A couple solutions "work" for various pathological cases of input
data:
import re
s = 'a=1,b="0234,)#($)@", k="7"'
r = re.compile(r"""
(?P<varname>\w+)
\s*=\s*(?:
"(?P<quoted>[^"]*)"
|
(?P<unquoted>[^,]+)
)
""", re.VERBOSE)
results = [
(m.group('varname'),
m.group('quoted') or
m.group('unquoted')
)
for m in r.finditer(s)
]
############### or ##############################
r = re.compile(r"""
(\w+)
\s*=\s*(
"(?:[^"]*)"
|
[^,]+
)
""", re.VERBOSE)
results = [
(m.group(1), m.group(2).strip('"'))
for m in r.finditer(s)
]
Things like internal quoting ('b="123\"456", c="123""456"') would
require a slightly smarter parser.
-tkc
--
http://mail.python.org/mailman/listinfo/python-list