On Mar 27, 8:43 am, Terry Reedy <tjre...@udel.edu> wrote: > R. David Murray wrote: > > OK, I've got a little problem that I'd like to ask the assembled minds > > for help with. I can write code to parse this, but I'm thinking it may > > be possible to do it with regexes. My regex foo isn't that good, so if > > anyone is willing to help (or offer an alternate parsing suggestion) > > I would be greatful. (This has to be stdlib only, by the way, I > > can't introduce any new modules into the application so pyparsing is > > not an option.) > > > The challenge is to turn a string like this: > > > a=1,b="0234,)#($)@", k="7" > > > into this: > > > [("a", "1"), ("b", "0234,)#($)#"), ("k", "7")] > > But the starting string IS is csv format, where the values are strings > with the format name=string. > > >>> import csv > >>> myDialect = csv.excel > >>> myDialect.skipinitialspace = True # needed for space before 'k' > >>> a=list(csv.reader(['''a=1,b="0234,)#($)@", k="7"'''], myDialect))[0] > >>> a > ['a=1', 'b="0234', ')#($)@"', 'k="7"'] > >>> b=[tuple(s.split('=',1)) for s in a] > >>> b > [('a', '1'), ('b', '"0234'), (')#($)@"',), ('k', '"7"')] >
It's in the csv format that Excel accepts on input but this is irrelevant. The output does not meet the OP's requirements; it has taken the should-have-been-protected comma as a delimiter, and produced FOUR elements instead of THREE ... also note '"0234' has a leading " and ')#($)@"' has a trailing " -- http://mail.python.org/mailman/listinfo/python-list