Il giorno mercoledì 21 novembre 2012 20:25:10 UTC+1, Hans Mulder ha scritto: > On 21/11/12 17:59:05, Alister wrote: > > > On Wed, 21 Nov 2012 04:43:57 -0800, Giacomo Alzetta wrote: > > > > > >> I just came across this: > > >> > > >>>>> 'spam'.find('', 5) > > >> -1 > > >> > > >> > > >> Now, reading find's documentation: > > >> > > >>>>> print(str.find.__doc__) > > >> S.find(sub [,start [,end]]) -> int > > >> > > >> Return the lowest index in S where substring sub is found, > > >> such that sub is contained within S[start:end]. Optional arguments > > >> start and end are interpreted as in slice notation. > > >> > > >> Return -1 on failure. > > >> > > >> Now, the empty string is a substring of every string so how can find > > >> fail? > > >> find, from the doc, should be generally be equivalent to > > >> S[start:end].find(substring) + start, except if the substring is not > > >> found but since the empty string is a substring of the empty string it > > >> should never fail. > > >> > > >> Looking at the source code for find(in stringlib/find.h): > > >> > > >> Py_LOCAL_INLINE(Py_ssize_t) > > >> stringlib_find(const STRINGLIB_CHAR* str, Py_ssize_t str_len, > > >> const STRINGLIB_CHAR* sub, Py_ssize_t sub_len, > > >> Py_ssize_t offset) > > >> { > > >> Py_ssize_t pos; > > >> > > >> if (str_len < 0) > > >> return -1; > > >> > > >> I believe it should be: > > >> > > >> if (str_len < 0) > > >> return (sub_len == 0 ? 0 : -1); > > >> > > >> Is there any reason of having this unexpected behaviour or was this > > >> simply overlooked? > > > > > > why would you be searching for an empty string? > > > what result would you expect to get from such a search? > > > > > > In general, if > > > > needle in haystack[ start: ] > > > > return True, then you' expect > > > > haystack.find(needle, start) > > > > to return the smallest i >= start such that > > > > haystack[i:i+len(needle)] == needle > > > > also returns True. > > > > >>> "" in "spam"[5:] > > True > > >>> "spam"[5:5+len("")] == "" > > True > > >>> > > > > So, you'd expect that spam.find("", 5) would return 5. > > > > The only other consistent position would be that "spam"[5:] > > should raise an IndexError, because 5 is an invalid index. > > > > For that matter, I wouldn;t mind if "spam".find(s, 5) were > > to raise an IndexError. But if slicing at position 5 > > proudces an empry string, then .find should be able to > > find that empty string. > > > > -- HansM
Exactly! Either string[i:] with i >= len(string) should raise an IndexError or find(string, i) should return i. Anyway, thinking about this inconsistency can be solved in a simpler way and without adding comparson. You simply check the substring length first. If it is 0 you already know that the string is a substring of the given string and you return the "offset", so the two ifs at the beginning of the function ought to be swapped. -- http://mail.python.org/mailman/listinfo/python-list