Re: regex \b behaviour in python

MRAB Thu, 19 Jun 2008 15:56:23 -0700

On Jun 19, 8:46 pm, André Malo <[EMAIL PROTECTED]> wrote:
> * Walter Cruz wrote:
> > irb(main):001:0>"walter ' cruz".split(/\b/)
> > => ["walter", " ' ", "cruz"]
>
> > and in php:
>
> > Array
> > (
> >     [0] =>
> >     [1] => walter
> >     [2] =>  '
> >     [3] => cruz
> >     [4] =>
> > )
>
> > But in python the behaviour of \b is differente from ruby or php.
>
> My python here does the same, actually:
>
> $ cat foo.py
> import re
>
> x = "walter ' cruz"
> s = 0
> r = []
> for m in re.finditer(r'\b', x):
>     p = m.start()
>     if s != p:
>         r.append(x[s:p])
>         s = p
>
> print r
>
> $ python2.4 foo.py
> ['walter', " ' ", 'cruz']
> $ python2.5 foo.py
> ['walter', " ' ", 'cruz']
> $
>
Another way is:


>>> re.split(r"(\W+)", "walter ' cruz")
['walter', " ' ", 'cruz']

\W+ matches the non-word characters and the capturing parentheses
causes them also to be returned.

I'm surprised that splitting on \b doesn't work as expected, so it
might be that re.split has been defined only to split on one or more
characters. Is it something that should it be 'fixed'?
--
http://mail.python.org/mailman/listinfo/python-list

Re: regex \b behaviour in python

Reply via email to