Re: How to prevent re.split() from removing part of string

Jeremy Tue, 01 Dec 2009 06:32:38 -0800

On Nov 30, 5:24 pm, MRAB <[email protected]> wrote:
> Jeremy wrote:
> > I am using re.split to... well, split a string into sections.  I want
> > to split when, following a new line, there are 4 or fewer spaces.  The
> > pattern I use is:
>
> >         sections = re.split('\n\s{,4}[^\s]', lineoftext)
>
> > This splits appropriately but I lose the character matched by [^s].  I
> > know I can put parentheses around [^s] and keep the matched character,
> > but the character is placed in it's own element of the list instead of
> > with the rest of the lineoftext.
>
> > Does anyone know how I can accomplish this without losing the matched
> > character?
>
> First of all, \s matches any character that's _whitespace_, such as
> space, "\t", "\n", "\r", "\f". There's also \S, which matches any
> character that's not whitespace.


Thanks for the reminder.  I knew \S existed, but must have forgotten
about it.
>
> But in answer to your question, use a look-ahead:
>
>      sections = re.split('\n {,4}(?=\S)', lineoftext)

Yep, that does the trick.  Thanks for the help!

Jeremy

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to prevent re.split() from removing part of string

Reply via email to