Bugs item #1105286, was opened at 2005-01-19 10:04 Message generated for change (Comment added) made by tim_one You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1105286&group_id=5470
Category: Documentation Group: None Status: Open Resolution: None Priority: 5 Submitted By: YoHell (yohell) Assigned to: Nobody/Anonymous (nobody) Summary: Undocumented implicit strip() in split(None) string method Initial Comment: Hi! I noticed that the string method split() first does an implicit strip() before splitting when it's used with no arguments or with None as the separator (sep in the docs). There is no mention of this implicit strip() in the docs. Example 1: s = " word1 word2 " s.split() then returns ['word1', 'word2'] and not ['', 'word1', 'word2', ''] as one might expect. WHY IS THIS BAD? 1. Because it's undocumented. See: http://www.python.org/doc/current/lib/string-methods.html#l2h-197 2. Because it may lead to unexpected behavior in programs. Example 2: FASTA sequence headers are one line descriptors of biological sequences and are on this form: ">" + Identifier + whitespace + free text description. Let sHeader be a Python string containing a FASTA header. One could then use the following syntax to extract the identifier from the header: sID = sHeader[1:].split(None, 1)[0] However, this does not work if sHeader contains a faulty FASTA header where the identifier is missing or consists of whitespace. In that case sID will contain the first word of the free text description, which is not the desired behavior. WHAT SHOULD BE DONE? The implicit strip() should be removed, or at least should programmers be given the option to turn it off. At the very least it should be documented so that programmers have a chance of adapting their code to it. Thank you for an otherwise splendid language! /Joel Hedlund Ph.D. Student IFM Bioinformatics Linköping University ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2005-01-19 11:56 Message: Logged In: YES user_id=31435 I think the docs for split() under "String Methods" are quite clear: """ ... If sep is not specified or is None, a different splitting algorithm is applied. Words are separated by arbitrary length strings of whitespace characters (spaces, tabs, newlines, returns, and formfeeds). Consecutive whitespace delimiters are treated as a single delimiter ("'1 2 3'.split()" returns "['1', '2', '3']"). Splitting an empty string returns "['']". """ This won't change, because mountains of code rely on this behavior -- it's probably the single most common use case for .split(). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1105286&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com