On 11/24/2010 10:46 PM, Phlip wrote:
> HypoNt:
> I need to turn a human-readable list into a list():
>    print re.search(r'(?:(\w+), |and (\w+))+', 'whatever a, bbb, and
> c').groups()
> That currently returns ('c',). I'm trying to match "any word \w+
> followed by a comma, or a final word preceded by and."
> The match returns 'a, bbb, and c', but the groups return ('bbb', 'c').
> What do I type for .groups() to also get the 'a'?
> Please go easy on me (and no RTFM!), because I have only been using
> regular expressions for about 20 years...

A kind of lazy way just uses a pattern for the separators to fuel a call
to re.split(). I assume that " and " and " , " are both acceptable in
any position:

The best I've been able to do so far (due to split's annoying habit of
including the matches of any groups in the pattern I have to throw away
every second element) is:

>>> re.split("\s*(,|and)?\s*", 'whatever a, bbb, and c')[::2]
['whatever', 'a', 'bbb', '', 'c']

That empty string is because of the ", and" which isn't recognise as a
single delimiter.

A parsing package might give you better results.

