:

2009/10/14 Timur Tabi <timur.t...@gmail.com>:
> I'm having trouble creating a regex pattern that matches a string that
> has an optional substring in it.  What I'm looking for is a pattern
> that matches both of these strings:
>
> Subject: [PATCH 08/18] This is the patch name
> Subject: This is the patch name
>
> What I want is to extract the "This is the patch name".  I tried this:
>
> m = re.search('Subject:\s*(\[[\w\s]*\])*(.*)', x)
>
> Unfortunately, the second group appears to be too greedy, and returns
> this:
>
>>>> print m.group(1)
> None
>>>> print m.group(2)
> [PATCH 08/18] Subject line

It's not that the second group is too greedy. The first group isn't
matching what you want it to, because neither \w nor \s match the "/"
inside your brackets. This works for your example input:

>>> import re
>>> pattern = re.compile("Subject:\s*(?:\[[^\]]*\])?\s*(.*)")
>>> for s in (
...     "Subject: [PATCH 08/18] This is the patch name",
...     "Subject: This is the patch name",
... ):
...     re.search(pattern, s).group(1)
...
'This is the patch name'
'This is the patch name'

Going through the changes from your original regex in order:

'(?:etc)' instead of '(etc)' are non-grouping parentheses (since you
apparently don't care about that bit).

'[^\]]' instead of '[\w\s]' matches "everything except a closing bracket".

The '\s*' before the second set of parentheses takes out the leading
whitespace that would otherwise be returned as part of the match.

 -[]z.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to