"Ben Cartwright" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Your evidence points to some unoptimized code in the underlying C > implementation of Python. As such, this should probably go to the > python-dev list (http://mail.python.org/mailman/listinfo/python-dev). > > The problem is that the C library function memcmp is slow, and > str.count calls it frequently. See lines 2165+ in stringobject.c > (inside function string_count): > > r = 0; > while (i < m) { > if (!memcmp(s+i, sub, n)) { > r++; > i += n; > } else { > i++; > } > } > > This could be optimized as: > > r = 0; > while (i < m) { > if (s[i] == *sub && !memcmp(s+i, sub, n)) { > r++; > i += n; > } else { > i++; > } > } > > This tactic typically avoids most (sometimes all) of the calls to > memcmp. Other string search functions, including unicode.count, > unicode.index, and str.index, use this tactic, which is why you see > unicode.count performing better than str.count.
If not doing the same in str.count is indeed an oversight. a patch should be welcome (on the SF tracker). -- http://mail.python.org/mailman/listinfo/python-list