Hi!
Sorry for no attachments in previous message, I think my attachments
weren't redirected with message by lists.php.net email confirmation
system. I send them again, and for sure I attach links to public copy of
them over HTTP:
https://gist.github.com/761094 - php-5.3.4-hashtable-optimization.patch
https://gist.github.com/761096 - apc-3.1.6-hashtable-optimization.patch
I just took a quick look, so some preliminary notes:
1. +#define ZEND_HASH_SMALL_LOAD_FACTOR_INV 1.75 etc. - maybe makes
sense to convert them from floats to couple of integers? I'm not sure if
gcc is smart enough not to use floating point here, which would probably
be slower than int *7/4.
My idea here was that *7 could lead to overflow on integer values, and
division can't be made first. Way out is casting to int64 or float. And I
found specifying one float more human-readable, than 2 integers in
relation.
2. Would it be possible to clean up the patch? There are tons of diffs
like this:
- HASH_UNPROTECT_RECURSION(ht1);
- HASH_UNPROTECT_RECURSION(ht2);
+ HASH_UNPROTECT_RECURSION(ht1);
+ HASH_UNPROTECT_RECURSION(ht2);
which make no sense and just make it harder to understand what's going
on.
Possibly they're some whitespace changes. I'll create bug report, and
attach patches with removed such "changes".
3.
- ulong h; /* Used for numeric indexing */
+ ulong h;
why?
It should be back, this comment still remains valid, looks like it wasn't
at some time of evolution of patch. :)
AFAIR it was removed, when it turned out, that with open addressing we
can't use subsequent buckets like we used to do with numeric indexes. Then
I've changed that by making additional hash->bucket index macro
ZEND_HASH_BUCKET that prevents clustering.
4. zend_inline_hash_func - could you describe why you changed it? In any
case, it needs some comments, if you deleted the old one, please add
description of the new one any why it is better.
I'll provide some comments in patch. Now I can tell that the main idea was
to handle long keys (>= 8 bytes long) in new way: take sizeof(ulong) bytes
at once, hash them into hash variable, and to add remaining bytes, just
take last sizeof(ulong) bytes. This makes much less instructions to
execute, yet should keep hash function collision properties. Also, I don't
care (why should I?) that if key length isn't multiple of sizeof(ulong)
I'll count some bytes twice.
I would like to take such approach when nKeyLength < sizeof(ulong), which
speeds hashing function a lot for short (most common) keys, but that needs
to do some byte-masking and assumes that key's starting address is aligned
to sizeof(ulong), otherwise it could read bytes out of page and lead to
segmentation fault. Could we take such assumption, maybe just on specified
architectures, to speed it up more?
Will follow up after reading the patch more in depth.
Thank you for your comments.
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php