2011/10/10 galyle <gal...@gmail.com>: > HI, I've looked through this forum, but I haven't been able to find a > resolution to the problem I'm having (maybe I didn't look hard enough > -- I have to believe this has come up before). The problem is this: > I have a file which has 0, 2, or 3 groups that I'd like to record; > however, in the case of 3 groups, the third group is correctly > captured, but the first two groups get collapsed into just one group. > I'm sure that I'm missing something in the way I've constructed my > regular expression, but I can't figure out what's wrong. Does anyone > have any suggestions? > > The demo below showcases the problem I'm having: > > import re > > valid_line = re.compile('^\[(\S+)\]\[(\S+)\](?:\s+|\[(\S+)\])=|\s+[\d\ > [\']+.*$') > line1 = "[field1][field2] = blarg" > line2 = " 'a continuation of blarg'" > line3 = "[field1][field2][field3] = blorg" > > m = valid_line.match(line1) > print 'Expected: ' + m.group(1) + ', ' + m.group(2) > m = valid_line.match(line2) > print 'Expected: ' + str(m.group(1)) > m = valid_line.match(line3) > print 'Uh-oh: ' + m.group(1) + ', ' + m.group(2) > -- > http://mail.python.org/mailman/listinfo/python-list >
Hi, I believe, the space before = is causing problems (or the pattern missing it); you also need non greedy quantifiers +? to match as little as possible as opposed to the greedy default: valid_line = re.compile('^\[(\S+?)\]\[(\S+?)\](?:\s+|\[(\S+)\])\s*=|\s+[\d\[\']+.*$') or you can use word-patterns explicitely excluding the closing ], like: valid_line = re.compile('^\[([^\]]+)\]\[([^\]]+)\](?:\s+|\[([^\]]+)\])\s*=|\s+[\d\[\']+.*$') hth vbr -- http://mail.python.org/mailman/listinfo/python-list