On 2015-09-30 11:34, massi_...@msn.com wrote: > firstly the description of my problem. I have a string in the > following form: > > s = "name1 name2(1) name3 name4 (1, 4) name5(2) ..." > > that is a string made up of groups in the form 'name' (letters > only) plus possibly a tuple containing 1 or 2 integer values. > Blanks can be placed between names and tuples or not, but they > surely are placed beween two groups. I would like to process this > string in order to get a dictionary like this: > > d = { > "name1":(0, 0), > "name2":(1, 0), > "name3":(0, 0), > "name4":(1, 4), > "name5":(2, 0), > } > > I guess this problem can be tackled with regular expressions, b
First out of the gate, I suggest you follow Emile's advice and try using string expressions. However, if you *want* to do it with regular expressions, you can. It's ugly and might be fragile, but ############################################################# import re s = "name1 name2(1) name3 name4 (1, 4) name5(2) ..." r = re.compile(r""" \b # start at a word boundary (\w+) # capture the word \s* # optional whitespace (?: # start an optional grouping for things in the parens \( # a literal open-paren \s* # optional whitespace (\d+) # capture the number in those parens (?: # start a second optional grouping for the stuff after a comma \s* # optional whitespace , # a literal comma \s* # optional whitespace (\d+) # the second number )? # make the command and following number optional \) # a literal close-paren )? # make that stuff in parens optional """, re.X) d = {} for m in r.finditer(s): a, b, c = m.groups() d[a] = (int(b or 0), int(c or 0)) from pprint import pprint pprint(d) ############################################################# I'd stick with the commented version of the regexp if you were to use this anywhere so that others can follow what you're doing. -tkc -- https://mail.python.org/mailman/listinfo/python-list