On Oct 10, 4:59 pm, Vlastimil Brom <vlastimil.b...@gmail.com> wrote: > 2011/10/10 galyle <gal...@gmail.com>: > > > > > > > > > > > HI, I've looked through this forum, but I haven't been able to find a > > resolution to the problem I'm having (maybe I didn't look hard enough > > -- I have to believe this has come up before). The problem is this: > > I have a file which has 0, 2, or 3 groups that I'd like to record; > > however, in the case of 3 groups, the third group is correctly > > captured, but the first two groups get collapsed into just one group. > > I'm sure that I'm missing something in the way I've constructed my > > regular expression, but I can't figure out what's wrong. Does anyone > > have any suggestions? > > > The demo below showcases the problem I'm having: > > > import re > > > valid_line = re.compile('^\[(\S+)\]\[(\S+)\](?:\s+|\[(\S+)\])=|\s+[\d\ > > [\']+.*$') > > line1 = "[field1][field2] = blarg" > > line2 = " 'a continuation of blarg'" > > line3 = "[field1][field2][field3] = blorg" > > > m = valid_line.match(line1) > > print 'Expected: ' + m.group(1) + ', ' + m.group(2) > > m = valid_line.match(line2) > > print 'Expected: ' + str(m.group(1)) > > m = valid_line.match(line3) > > print 'Uh-oh: ' + m.group(1) + ', ' + m.group(2) > > -- > >http://mail.python.org/mailman/listinfo/python-list > > Hi, > I believe, the space before = is causing problems (or the pattern missing it); > you also need non greedy quantifiers +? to match as little as possible > as opposed to the greedy default: > > valid_line = > re.compile('^\[(\S+?)\]\[(\S+?)\](?:\s+|\[(\S+)\])\s*=|\s+[\d\[\']+.*$') > > or you can use word-patterns explicitely excluding the closing ], like: > > valid_line = > re.compile('^\[([^\]]+)\]\[([^\]]+)\](?:\s+|\[([^\]]+)\])\s*=|\s+[\d\[\']+. > *$') > > hth > vbr
Thanks, I had a feeling that greedy matching in my expression was causing problem. Your suggestion makes sense to me, and works quite well. -- http://mail.python.org/mailman/listinfo/python-list