[issue31484] Cache single-character strings outside of the Latin1 range

Xiang Zhang Sun, 17 Sep 2017 09:11:30 -0700

Xiang Zhang added the comment:

I run the patch against a toy NLP application, cutting words from Shui Hu Zhuan 
provided by Serhiy. The result is not bad, 6% faster. And I also count the hit 
rate, 90% hit cell 0， 4.5 hit cell 1, 5.5% miss. I also increase the cache size 
to 1024 * 2. Although the hit rate increases to 95.4%, 2.1%, 2.4%, it's still 
6% difference.


So IMHO this patch could hardly affect that *much* real-world applications, 
better or worse. I couldn't recall clearly the implementation of unicode but 
why can't we reuse the latin1 cache when we use this bmp cache? And then to 
avoid the chars' low bits conflicting with ASCII chars' low bits we have to 
introduce the mini-LRU-cache, which is not that easily understandable.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31484>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue31484] Cache single-character strings outside of the Latin1 range

Reply via email to