: 2009/10/14 Timur Tabi <timur.t...@gmail.com>: > I'm having trouble creating a regex pattern that matches a string that > has an optional substring in it. What I'm looking for is a pattern > that matches both of these strings: > > Subject: [PATCH 08/18] This is the patch name > Subject: This is the patch name > > What I want is to extract the "This is the patch name". I tried this: > > m = re.search('Subject:\s*(\[[\w\s]*\])*(.*)', x) > > Unfortunately, the second group appears to be too greedy, and returns > this: > >>>> print m.group(1) > None >>>> print m.group(2) > [PATCH 08/18] Subject line
It's not that the second group is too greedy. The first group isn't matching what you want it to, because neither \w nor \s match the "/" inside your brackets. This works for your example input: >>> import re >>> pattern = re.compile("Subject:\s*(?:\[[^\]]*\])?\s*(.*)") >>> for s in ( ... "Subject: [PATCH 08/18] This is the patch name", ... "Subject: This is the patch name", ... ): ... re.search(pattern, s).group(1) ... 'This is the patch name' 'This is the patch name' Going through the changes from your original regex in order: '(?:etc)' instead of '(etc)' are non-grouping parentheses (since you apparently don't care about that bit). '[^\]]' instead of '[\w\s]' matches "everything except a closing bracket". The '\s*' before the second set of parentheses takes out the leading whitespace that would otherwise be returned as part of the match. -[]z. -- http://mail.python.org/mailman/listinfo/python-list