Re: [PHP-DEV] Re: Zend engine's hashtable performance tweaks

Marcin Babij Sat, 01 Jan 2011 04:31:46 -0800

Hi!
Sorry for no attachments in previous message, I think my attachments
weren't redirected with message by lists.php.net email confirmation
system. I send them again, and for sure I attach links to public copy of
them over HTTP:
https://gist.github.com/761094 - php-5.3.4-hashtable-optimization.patch
https://gist.github.com/761096 - apc-3.1.6-hashtable-optimization.patch
I just took a quick look, so some preliminary notes:
1. +#define ZEND_HASH_SMALL_LOAD_FACTOR_INV 1.75 etc. - maybe makessense to convert them from floats to couple of integers? I'm not sure ifgcc is smart enough not to use floating point here, which would probablybe slower than int *7/4.

My idea here was that *7 could lead to overflow on integer values, anddivision can't be made first. Way out is casting to int64 or float. And Ifound specifying one float more human-readable, than 2 integers inrelation.

2. Would it be possible to clean up the patch? There are tons of diffslike this:
- HASH_UNPROTECT_RECURSION(ht1);
- HASH_UNPROTECT_RECURSION(ht2);
+ HASH_UNPROTECT_RECURSION(ht1);
+ HASH_UNPROTECT_RECURSION(ht2);
which make no sense and just make it harder to understand what's goingon.

Possibly they're some whitespace changes. I'll create bug report, andattach patches with removed such "changes".


3.
- ulong h; /* Used for numeric indexing */
+ ulong h;

why?

It should be back, this comment still remains valid, looks like it wasn'tat some time of evolution of patch. :)AFAIR it was removed, when it turned out, that with open addressing wecan't use subsequent buckets like we used to do with numeric indexes. ThenI've changed that by making additional hash->bucket index macroZEND_HASH_BUCKET that prevents clustering.

4. zend_inline_hash_func - could you describe why you changed it? In anycase, it needs some comments, if you deleted the old one, please adddescription of the new one any why it is better.

I'll provide some comments in patch. Now I can tell that the main idea wasto handle long keys (>= 8 bytes long) in new way: take sizeof(ulong) bytesat once, hash them into hash variable, and to add remaining bytes, justtake last sizeof(ulong) bytes. This makes much less instructions toexecute, yet should keep hash function collision properties. Also, I don'tcare (why should I?) that if key length isn't multiple of sizeof(ulong)I'll count some bytes twice.I would like to take such approach when nKeyLength < sizeof(ulong), whichspeeds hashing function a lot for short (most common) keys, but that needsto do some byte-masking and assumes that key's starting address is alignedto sizeof(ulong), otherwise it could read bytes out of page and lead tosegmentation fault. Could we take such assumption, maybe just on specifiedarchitectures, to speed it up more?


Will follow up after reading the patch more in depth.


Thank you for your comments.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Re: Zend engine's hashtable performance tweaks

Reply via email to