New submission from STINNER Victor:

Attached patch optimize a==b and a!=b operators for bytes and str types of 
Python 3.4. For str, memcmp() is now always used, instead of a loop using 
PyUnicode_READ() (which is slow) for kind different than 1. For bytes, compare 
the first but also the last byte before calling memcmp(), instead of just 
comparing the first byte. Similar optimization was implemented in 
Py_UNICODE_MATCH():

changeset:   38242:0de9a789de39
branch:      legacy-trunk
user:        Fredrik Lundh <fred...@pythonware.com>
date:        Tue May 23 10:10:57 2006 +0000
files:       Include/unicodeobject.h
description:
needforspeed: check first *and* last character before doing a full memcmp

Initially I only wrote the patch to check the hash values before comparing 
content of the strings.

--

I done some statistics tests. For a fresh Python interpreter, the hash values 
are only known in 7% cases (but when hashes are compared, they are quite always 
different, so the optimization is useful). When running "./python -m test 
test_os", hashes are known and different in 41.4%. After running 70 tests, 
hashes are known and different in 80%.

----------
files: compare_hash.patch
keywords: patch
messages: 173332
nosy: haypo, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Optimize a==b and a!=b for bytes and str
type: performance
versions: Python 3.4
Added file: http://bugs.python.org/file27623/compare_hash.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16286>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to