Bugs item #1105286, was opened at 2005-01-19 16:04 Message generated for change (Comment added) made by yohell You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1105286&group_id=5470
Category: Documentation Group: None Status: Open Resolution: None Priority: 5 Submitted By: YoHell (yohell) Assigned to: Raymond Hettinger (rhettinger) Summary: Undocumented implicit strip() in split(None) string method Initial Comment: Hi! I noticed that the string method split() first does an implicit strip() before splitting when it's used with no arguments or with None as the separator (sep in the docs). There is no mention of this implicit strip() in the docs. Example 1: s = " word1 word2 " s.split() then returns ['word1', 'word2'] and not ['', 'word1', 'word2', ''] as one might expect. WHY IS THIS BAD? 1. Because it's undocumented. See: http://www.python.org/doc/current/lib/string-methods.html#l2h-197 2. Because it may lead to unexpected behavior in programs. Example 2: FASTA sequence headers are one line descriptors of biological sequences and are on this form: ">" + Identifier + whitespace + free text description. Let sHeader be a Python string containing a FASTA header. One could then use the following syntax to extract the identifier from the header: sID = sHeader[1:].split(None, 1)[0] However, this does not work if sHeader contains a faulty FASTA header where the identifier is missing or consists of whitespace. In that case sID will contain the first word of the free text description, which is not the desired behavior. WHAT SHOULD BE DONE? The implicit strip() should be removed, or at least should programmers be given the option to turn it off. At the very least it should be documented so that programmers have a chance of adapting their code to it. Thank you for an otherwise splendid language! /Joel Hedlund Ph.D. Student IFM Bioinformatics Linköping University ---------------------------------------------------------------------- >Comment By: YoHell (yohell) Date: 2005-01-20 15:59 Message: Logged In: YES user_id=1008220 Brilliant, guys! Thanks again for a superb scripting language, and with documentation to match! Take care! /Joel Hedlund ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2005-01-20 15:50 Message: Logged In: YES user_id=80475 The prosposed wording is fine. If there are no objections or concerns, I'll apply it soon. ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2005-01-20 15:28 Message: Logged In: YES user_id=764593 Replacing the quoted line: """ ... If sep is not specified or is None, a different splitting algorithm is applied. First whitespace (spaces, tabs, newlines, returns, and formfeeds) is stripped from both ends. Then words are separated by arbitrary length strings of whitespace characters . Consecutive whitespace delimiters are treated as a single delimiter ("'1 2 3'.split()" returns "['1', '2', '3']"). Splitting an empty (or whitespace- only) string returns "['']". """ ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2005-01-20 15:04 Message: Logged In: YES user_id=80475 What new wording do you propose to be added? ---------------------------------------------------------------------- Comment By: YoHell (yohell) Date: 2005-01-20 11:15 Message: Logged In: YES user_id=1008220 In RE to tim_one: > I think the docs for split() under "String Methods" are quite > clear: On the countrary, my friend, and here's why: > """ > ... > If sep is not specified or is None, a different splitting > algorithm is applied. This sentecnce does not say that whitespace will be implicitly stripped from the edges of the string. > Words are separated by arbitrary length strings of whitespace > characters (spaces, tabs, newlines, returns, and formfeeds). Neither does this one. > Consecutive whitespace delimiters are treated as a single delimiter ("'1 > 2 3'.split()" returns "['1', '2', '3']"). And not that one. > Splitting an empty string returns "['']". > """ And that last one does not mention it either. In fact, there is no mention in the docs of how separators on edges of strings are treated by the split method. And furthermore, there is no mention of that s.split(sep) treats them differrently when sep is None than it does otherwise. Example: >>> ",2,".split(',') ['', '2', ''] >>> " 2 ".split() ['2'] This inconsistent behavior is not in line with how beautifully thought out the Python language is otherwise, and how brilliantly everything else is documented on the http://python.org/doc/ documentation pages. > This won't change, because mountains of code rely on this > behavior -- it's probably the single most common use case > for .split(). I thought as much. However - it's would be Really easy for an admin to add a line of documentation to .split() to explain this. That would certainly help make me a happier man, and hopefully others too. Cheers guys! /Joel ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2005-01-19 17:56 Message: Logged In: YES user_id=31435 I think the docs for split() under "String Methods" are quite clear: """ ... If sep is not specified or is None, a different splitting algorithm is applied. Words are separated by arbitrary length strings of whitespace characters (spaces, tabs, newlines, returns, and formfeeds). Consecutive whitespace delimiters are treated as a single delimiter ("'1 2 3'.split()" returns "['1', '2', '3']"). Splitting an empty string returns "['']". """ This won't change, because mountains of code rely on this behavior -- it's probably the single most common use case for .split(). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1105286&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com