John Machin <sjmac...@lexicon.net> wrote: > On Mar 27, 6:51 am, "R. David Murray" <rdmur...@bitdance.com> wrote: > > OK, I've got a little problem that I'd like to ask the assembled minds > > for help with. I can write code to parse this, but I'm thinking it may > > be possible to do it with regexes. My regex foo isn't that good, so if > > anyone is willing to help (or offer an alternate parsing suggestion) > > I would be greatful. (This has to be stdlib only, by the way, I > > can't introduce any new modules into the application so pyparsing is > > not an option.) > > > > The challenge is to turn a string like this: > > > > a=1,b="0234,)#($)@", k="7" > > > > into this: > > > > [("a", "1"), ("b", "0234,)#($)#"), ("k", "7")] > > The challenge is for you to explain unambiguously what you want. > > 1. a=1 => "1" and k="7" => "7" ... is this a mistake or are the quotes > optional in the original string when not required to protect a comma?
optional. > 2. What is the rule that explains the transmogrification of @ to # in > your example? Now that's a mistake :) > 3. Is the input guaranteed to be syntactically correct? If it's not, it's the customer that gets to deal with the error. > The following should do close enough to what you want; adjust as > appropriate. > > >>> import re > >>> s = """a=1,b="0234,)#($)@", k="7" """ > >>> rx = re.compile(r'[ ]*(\w+)=([^",]+|"[^"]*")[ ]*(?:,|$)') > >>> rx.findall(s) > [('a', '1'), ('b', '"0234,)#($)@"'), ('k', '"7"')] > >>> rx.findall('a=1, *DODGY*SYNTAX* b=2') > [('a', '1'), ('b', '2')] > >>> I'm going to save this one and study it, too. I'd like to learn to use regexes better, even if I do try to avoid them when possible :) -- R. David Murray http://www.bitdance.com -- http://mail.python.org/mailman/listinfo/python-list