On 2012-11-21 19:25, Hans Mulder wrote:
On 21/11/12 17:59:05, Alister wrote:
On Wed, 21 Nov 2012 04:43:57 -0800, Giacomo Alzetta wrote:

I just came across this:

'spam'.find('', 5)
-1


Now, reading find's documentation:

print(str.find.__doc__)
S.find(sub [,start [,end]]) -> int

Return the lowest index in S where substring sub is found,
such that sub is contained within S[start:end].  Optional arguments
start and end are interpreted as in slice notation.

Return -1 on failure.

Now, the empty string is a substring of every string so how can find
fail?
find, from the doc, should be generally be equivalent to
S[start:end].find(substring) + start, except if the substring is not
found but since the empty string is a substring of the empty string it
should never fail.

Looking at the source code for find(in stringlib/find.h):

Py_LOCAL_INLINE(Py_ssize_t)
stringlib_find(const STRINGLIB_CHAR* str, Py_ssize_t str_len,
               const STRINGLIB_CHAR* sub, Py_ssize_t sub_len,
               Py_ssize_t offset)
{
    Py_ssize_t pos;

    if (str_len < 0)
        return -1;

I believe it should be:

    if (str_len < 0)
        return (sub_len == 0 ? 0 : -1);

Is there any reason of having this unexpected behaviour or was this
simply overlooked?

why would you be searching for an empty string?
what result would you expect to get from such a search?


In general, if

     needle in haystack[ start: ]

return True, then you' expect

     haystack.find(needle, start)

to return the smallest i >= start such that

     haystack[i:i+len(needle)] == needle

also returns True.

"" in "spam"[5:]
True
"spam"[5:5+len("")] == ""
True


So, you'd expect that spam.find("", 5) would return 5.

The only other consistent position would be that "spam"[5:]
should raise an IndexError, because 5 is an invalid index.

For that matter, I wouldn;t mind if "spam".find(s, 5) were
to raise an IndexError.  But if slicing at position 5
proudces an empry string, then .find should be able to
find that empty string.

You'd expect that given:

    found = string.find(something, start, end)

if 'something' present then the following are true:

    0 <= found <= len(string)

    start <= found <= end

(I'm assuming here that 'start' and 'end' have already been adjusted
for counting from the end, ie originally they might have been negative
values.)

The only time that you can have found == len(string) and found == end
is when something == "" and start == len(string).

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to