On Jun 19, 8:46 pm, André Malo <[EMAIL PROTECTED]> wrote:
> * Walter Cruz wrote:
> > irb(main):001:0>"walter ' cruz".split(/\b/)
> > => ["walter", " ' ", "cruz"]
>
> > and in php:
>
> > Array
> > (
> >     [0] =>
> >     [1] => walter
> >     [2] =>  '
> >     [3] => cruz
> >     [4] =>
> > )
>
> > But in python the behaviour of \b is differente from ruby or php.
>
> My python here does the same, actually:
>
> $ cat foo.py
> import re
>
> x = "walter ' cruz"
> s = 0
> r = []
> for m in re.finditer(r'\b', x):
>     p = m.start()
>     if s != p:
>         r.append(x[s:p])
>         s = p
>
> print r
>
> $ python2.4 foo.py
> ['walter', " ' ", 'cruz']
> $ python2.5 foo.py
> ['walter', " ' ", 'cruz']
> $
>
Another way is:

>>> re.split(r"(\W+)", "walter ' cruz")
['walter', " ' ", 'cruz']

\W+ matches the non-word characters and the capturing parentheses
causes them also to be returned.

I'm surprised that splitting on \b doesn't work as expected, so it
might be that re.split has been defined only to split on one or more
characters. Is it something that should it be 'fixed'?
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to