MRAB wrote:
Ethan Furman wrote:

kj wrote:


Sometimes I want to split a string into lines, preserving the
end-of-line markers.  In Perl this is really easy to do, by splitting
on the beginning-of-line anchor:

  @lines = split /^/, $string;

But I can't figure out how to do the same thing with Python.  E.g.:


import re
re.split('^', 'spam\nham\neggs\n')


['spam\nham\neggs\n']

re.split('(?m)^', 'spam\nham\neggs\n')


['spam\nham\neggs\n']

bol_re = re.compile('^', re.M)
bol_re.split('spam\nham\neggs\n')


['spam\nham\neggs\n']

Am I doing something wrong?

kynn


As you probably noticed from the other responses: No, you can't split on _and_ keep the splitby text.

You _can_ split and keep what you split on:

 >>> re.split("(x)", "abxcd")
['ab', 'x', 'cd']

You _can't_ split on a zero-width match:

 >>> re.split("(x*)", "abxcd")
['ab', 'x', 'cd']

but you can use re.sub to replace zero-width matches with something
that's not zero-width and then split on that (best with str.split):

 >>> re.sub("(x*)", "@", "abxcd")
'@a...@b@c...@d@'
 >>> re.sub("(x*)", "@", "abxcd").split("@")
['', 'a', 'b', 'c', 'd', '']

Wow! I stand corrected, although I'm in danger of falling over from the dizziness! :)

As impressive as that is, I don't think it does what the OP is looking for. rurpy reminded us (or at least me ;) of .splitlines(), which seems to do exactly what the OP is looking for. I do take some comfort that my little snippet works for more than newlines alone, although I'm not aware of any other use-cases. :(

~Ethan~

Oh, hey, how about this?

re.compile('(^[^\n]*\n?)', re.M).findall('text\ntext\ntext)

Although this does give me an extra blank segment at the end... oh well.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to