Raymond Hettinger added the comment:

A few randomly ordered thoughts about splitting:

* The best general purpose text splitter I've ever seen is in MS Excel and is 
called "Text to Columns".  It has a boolean flag, "treat consecutive delimiters 
as one" which is off by default.

* There is a nice discussion on the complexities of the current design on 
StackOverflow:  http://stackoverflow.com/questions/16645083  In addition, there 
are many other SO questions about the behavior of str.split().

* The learning curve for str.split() is already high.  The doc entry for it has 
been revised many times to try and explain what it does.  I'm concerned that 
adding another algorithmic option to it may make it more difficult to learn and 
use in the common cases (API design principle:  giving users more options can 
impair usability).  Usually in Python courses, I recommend using str.split() 
for the simple, common cases and using regex when you need more control.

* What I do like about the proposal is that that there is no clean way to take 
the default whitespace splitting algorithm and customize to a particular subset 
of whitespace (i.e. tabs only).

* A tangential issue is that it was a mistake to expose the maxsplit=-1 
implementation detail.   In Python 2.7, the help was "S.split([sep 
[,maxsplit]])".  But folks implementing the argument clinic have no way of 
coping with optional arguments that don't have a default value (like dict.pop), 
so they changed the API so that the implementation detail was exposed, 
"S.split(sep=None, maxsplit=-1)".   IMO, this is an API regression.  We really 
don't want people passing in -1 to indicate that there are no limits.  The 
Python way would have been to use None as a default or to stick with the 
existing API where the number of arguments supplied is part of the API (much 
like type() has two different meanings depending on whether it has an arity of 
1 or 3).

Overall, I'm +0 on the proposal but there should be good consideration given to 
1) whether there is a sufficient need to warrant increasing API complexity, 
making split() more difficult to learn and remember, 2) considering whether 
"prune" is the right word (can someone who didn't write the code read it 
clearly afterwards), 3) or addressing this through documentation (i.e. showing 
the simple regexes needed for cases not covered by str.split).

----------
nosy: +rhettinger

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28937>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to