Sorry, I've attached .patch files, I'm attaching them as .txt now.
hi,
Did you forget to attach the patch? Attach it as .txt so the list
won't reject it.
Cheers,
On Fri, Dec 31, 2010 at 3:40 PM, Marcin Babij
<marcin.ba...@nasza-klasa.pl> wrote:
Hello,
I work for social network company, where we were running optimization
project. One of it's results is patch to Zend engine's Hashtable, which
we
want to share and ask you for comments and improvements.
Why we do this?
We run profiling on our production servers and found out that
zend_hash_*
functions take 10-20% CPU time of request. So there is some room for
easy
improvements.
What was done?
- Hash function in zend_hash.h was rebuilt and became much faster,
without
losing the most important properties.
- Hashtable implementation was changed from Simple chaining to Open
addressing with linear probing, but with linked bucket, not included in
hash
array, which causes:
-- Bucket structure to lose 2 pointers.
-- Searching works similar, but don't have to jump with pointers stored
in
different memory locations, inserting, deleting and rehashing don't
need to
update linked list, but must search for first empty bucket, which is
fast,
because it scans continuous memory.
-- Load factor decreases from 1.0 to 0.5-0.75 to make less collisions
and
faster hashtable, which in turn increases memory footprint a little.
- Open addressing doesn't change significantly performance, but next
thing
was to create new array (arEmpty), which is of size nTableSize bytes,
which
keeps track of used/empty buckets and makes inserting and rehashing much
faster. In future it can be tested as bit-array with size of
nTableSize/8
bytes.
- More macros were added to replace repetitive constructs.
- New constants were added to allow:
-- Creating new hashtables of size at least X (where 4 and 8 are
reasonable), which makes no rehashing and reallocing memory while
changing
size to 2 and then to 4.
-- For small tables it's better to extend them by a factor of 4 times,
not
2, to make rehashing cost smaller for most hashtables, of cost of little
higher memory consumption.
-- For large tables it's better to have other load factor, closer to 1,
while for small tables it's better to use load factor closer to 0.5.
- APC was patched to take changes in Bucket structure into account.
How was it tested?
It was tested with make test, where one more (comparing to original
sources)
test fails, but it's most probably because
http://bugs.php.net/bug.php?id=48858 - IMO test is badly constructed
(is too
simple) and any change of hashing function makes it fail. Also it was
tested
on our testing environment and production servers against >30mln
requests to
our site, with 120requests/s at peak on Xeon @ 2.50GHz with 8GB RAM
running
Debian Linux.
What is the gain?
After tests CPU usage dropped by about 4% -6%.
Memory footprint goes up just by few percent.
What can be done in future?
- Make new constants configurable by php.ini.
- Test if changing arEmpty from byte-array to bit-array helps on
performance.
- Tweak default constants' values using some real-live benchmarks.
- Prove (or modify and prove) hash function to have property, that it
has no
collisions if two keys don't differ on no more than 6 bytes, which will
lead
to memcmp omit first (or last) 6 bytes of key. Also simpler thing may be
proven, that is it has no collisions if two keys are not longer than 6
bytes, which will make most string keys omit memcpy at all.
The patch was created and tested against php-5.3.0, apc-3.1.3p1, then
merged
with php-5.3.4, apc-3.1.6 without conflicts, and for these last versions
patches are attached. Also, it shouldn't conflict with
http://wiki.php.net/rfc/performanceimprovements .
--
Marcin Babij
nasza-klasa.pl
Index: Zend/zend_hash.c
===================================================================
--- Zend/zend_hash.c 2010-12-30 17:09:14.000000000 +0100
+++ Zend/zend_hash.c 2010-12-31 01:45:14.000000000 +0100
@@ -5,7 +5,7 @@
| Copyright (c) 1998-2010 Zend Technologies Ltd. (http://www.zend.com) |
+----------------------------------------------------------------------+
| This source file is subject to version 2.00 of the Zend license, |
- | that is bundled with this package in the file LICENSE, and is |
+ | that is bundled with this package in the file LICENSE, and is |
| available through the world-wide-web at the following url: |
| http://www.zend.com/license/2_00.txt. |
| If you did not receive a copy of the Zend license and are unable to |
@@ -21,12 +21,16 @@
#include "zend.h"
-#define CONNECT_TO_BUCKET_DLLIST(element, list_head) \
- (element)->pNext = (list_head);
\
- (element)->pLast = NULL;
\
- if ((element)->pNext) {
\
- (element)->pNext->pLast = (element);
\
- }
+#define ZEND_HASH_SMALL_TABLE_SIZE 64
+#define ZEND_HASH_MIN_TABLE_SIZE 4
+#define ZEND_HASH_SMALL_LOAD_FACTOR_TABLE_SIZE 16
+#define ZEND_HASH_SMALL_LOAD_FACTOR_INV 1.75
+#define ZEND_HASH_LOAD_FACTOR_INV 1.25
+
+#define EQ_INDEX(ptr,index) \
+ (EXPECTED(((ptr)->nKeyLength == 0) && (ptr)->h == index))
+
+#define CONNECT_TO_BUCKET_DLLIST(element, list_head)
#define CONNECT_TO_GLOBAL_DLLIST(element, ht) \
(element)->pListLast = (ht)->pListTail;
\
@@ -91,7 +95,9 @@
#define ZEND_HASH_IF_FULL_DO_RESIZE(ht) \
- if ((ht)->nNumOfElements > (ht)->nTableSize) { \
+ if ((ht)->nTableSize <= ZEND_HASH_SMALL_LOAD_FACTOR_TABLE_SIZE ? \
+ ((ht)->nNumOfElements*ZEND_HASH_SMALL_LOAD_FACTOR_INV >
(ht)->nTableSize) : \
+ ((ht)->nNumOfElements*ZEND_HASH_LOAD_FACTOR_INV >
(ht)->nTableSize)) { \
zend_hash_do_resize(ht);
\
}
@@ -102,7 +108,6 @@
return zend_inline_hash_func(arKey, nKeyLength);
}
-
#define UPDATE_DATA(ht, p, pData, nDataSize)
\
if (nDataSize == sizeof(void*)) {
\
if ((p)->pData != &(p)->pDataPtr) {
\
@@ -136,11 +141,10 @@
}
-
ZEND_API int _zend_hash_init(HashTable *ht, uint nSize, hash_func_t
pHashFunction, dtor_func_t pDestructor, zend_bool persistent ZEND_FILE_LINE_DC)
{
uint i = 3;
- Bucket **tmp;
+ void *tmp;
SET_INCONSISTENT(HT_OK);
@@ -151,7 +155,10 @@
while ((1U << i) < nSize) {
i++;
}
- ht->nTableSize = 1 << i;
+ ht->nTableSize = 1 << (i+1);
+ if (ht->nTableSize < ZEND_HASH_MIN_TABLE_SIZE) {
+ ht->nTableSize = ZEND_HASH_MIN_TABLE_SIZE;
+ }
}
ht->nTableMask = ht->nTableSize - 1;
@@ -165,21 +172,15 @@
ht->persistent = persistent;
ht->nApplyCount = 0;
ht->bApplyProtection = 1;
-
- /* Uses ecalloc() so that Bucket* == NULL */
- if (persistent) {
- tmp = (Bucket **) calloc(ht->nTableSize, sizeof(Bucket *));
- if (!tmp) {
- return FAILURE;
- }
- ht->arBuckets = tmp;
- } else {
- tmp = (Bucket **) ecalloc_rel(ht->nTableSize, sizeof(Bucket *));
- if (tmp) {
- ht->arBuckets = tmp;
- }
+
+ tmp = pemalloc_rel(ht->nTableSize * (sizeof(Bucket*)+1),
ht->persistent);
+ if (!tmp) {
+ return FAILURE;
}
-
+ ht->arBuckets = (Bucket**)tmp;
+ ht->arEmpty = ((uchar*)tmp) + (ht->nTableSize * (sizeof(Bucket*)));
+ memset(ht->arEmpty, 0, ht->nTableSize * 1);
+
return SUCCESS;
}
@@ -216,38 +217,34 @@
}
h = zend_inline_hash_func(arKey, nKeyLength);
- nIndex = h & ht->nTableMask;
+ nIndex = ZEND_HASH_BUCKET(ht, h);
- p = ht->arBuckets[nIndex];
- while (p != NULL) {
- if ((p->h == h) && (p->nKeyLength == nKeyLength)) {
- if (!memcmp(p->arKey, arKey, nKeyLength)) {
- if (flag & HASH_ADD) {
- return FAILURE;
- }
- HANDLE_BLOCK_INTERRUPTIONS();
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, p) {
+ if (EQ_HASH_KEYS(p, arKey, h, nKeyLength)) {
+ if (flag & HASH_ADD) {
+ return FAILURE;
+ }
+ HANDLE_BLOCK_INTERRUPTIONS();
#if ZEND_DEBUG
- if (p->pData == pData) {
- ZEND_PUTS("Fatal error in
zend_hash_update: p->pData == pData\n");
- HANDLE_UNBLOCK_INTERRUPTIONS();
- return FAILURE;
- }
-#endif
- if (ht->pDestructor) {
- ht->pDestructor(p->pData);
- }
- UPDATE_DATA(ht, p, pData, nDataSize);
- if (pDest) {
- *pDest = p->pData;
- }
+ if (p->pData == pData) {
+ ZEND_PUTS("Fatal error in zend_hash_update:
p->pData == pData\n");
HANDLE_UNBLOCK_INTERRUPTIONS();
- return SUCCESS;
+ return FAILURE;
+ }
+#endif
+ if (ht->pDestructor) {
+ ht->pDestructor(p->pData);
+ }
+ UPDATE_DATA(ht, p, pData, nDataSize);
+ if (pDest) {
+ *pDest = p->pData;
}
+ HANDLE_UNBLOCK_INTERRUPTIONS();
+ return SUCCESS;
}
- p = p->pNext;
}
-
- p = (Bucket *) pemalloc(sizeof(Bucket) - 1 + nKeyLength,
ht->persistent);
+
+ p = (Bucket *)pemalloc_rel(sizeof(Bucket) - 1 + nKeyLength,
ht->persistent);
if (!p) {
return FAILURE;
}
@@ -262,11 +259,11 @@
HANDLE_BLOCK_INTERRUPTIONS();
CONNECT_TO_GLOBAL_DLLIST(p, ht);
- ht->arBuckets[nIndex] = p;
+ ZEND_HASH_FILL_BUCKET(ht, nIndex, p);
HANDLE_UNBLOCK_INTERRUPTIONS();
ht->nNumOfElements++;
- ZEND_HASH_IF_FULL_DO_RESIZE(ht); /* If the Hash table is
full, resize it */
+ ZEND_HASH_IF_FULL_DO_RESIZE(ht);
return SUCCESS;
}
@@ -281,38 +278,34 @@
return zend_hash_index_update(ht, h, pData, nDataSize, pDest);
}
- nIndex = h & ht->nTableMask;
-
- p = ht->arBuckets[nIndex];
- while (p != NULL) {
- if ((p->h == h) && (p->nKeyLength == nKeyLength)) {
- if (!memcmp(p->arKey, arKey, nKeyLength)) {
- if (flag & HASH_ADD) {
- return FAILURE;
- }
- HANDLE_BLOCK_INTERRUPTIONS();
+ nIndex = ZEND_HASH_BUCKET(ht, h);
+
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, p) {
+ if (EQ_HASH_KEYS(p, arKey, h, nKeyLength)) {
+ if (flag & HASH_ADD) {
+ return FAILURE;
+ }
+ HANDLE_BLOCK_INTERRUPTIONS();
#if ZEND_DEBUG
- if (p->pData == pData) {
- ZEND_PUTS("Fatal error in
zend_hash_update: p->pData == pData\n");
- HANDLE_UNBLOCK_INTERRUPTIONS();
- return FAILURE;
- }
-#endif
- if (ht->pDestructor) {
- ht->pDestructor(p->pData);
- }
- UPDATE_DATA(ht, p, pData, nDataSize);
- if (pDest) {
- *pDest = p->pData;
- }
+ if (p->pData == pData) {
+ ZEND_PUTS("Fatal error in zend_hash_update:
p->pData == pData\n");
HANDLE_UNBLOCK_INTERRUPTIONS();
- return SUCCESS;
+ return FAILURE;
}
+#endif
+ if (ht->pDestructor) {
+ ht->pDestructor(p->pData);
+ }
+ UPDATE_DATA(ht, p, pData, nDataSize);
+ if (pDest) {
+ *pDest = p->pData;
+ }
+ HANDLE_UNBLOCK_INTERRUPTIONS();
+ return SUCCESS;
}
- p = p->pNext;
}
-
- p = (Bucket *) pemalloc(sizeof(Bucket) - 1 + nKeyLength,
ht->persistent);
+
+ p = (Bucket *)pemalloc_rel(sizeof(Bucket) - 1 + nKeyLength,
ht->persistent);
if (!p) {
return FAILURE;
}
@@ -321,7 +314,7 @@
p->nKeyLength = nKeyLength;
INIT_DATA(ht, p, pData, nDataSize);
p->h = h;
-
+
CONNECT_TO_BUCKET_DLLIST(p, ht->arBuckets[nIndex]);
if (pDest) {
@@ -329,12 +322,12 @@
}
HANDLE_BLOCK_INTERRUPTIONS();
- ht->arBuckets[nIndex] = p;
+ ZEND_HASH_FILL_BUCKET(ht, nIndex, p);
CONNECT_TO_GLOBAL_DLLIST(p, ht);
HANDLE_UNBLOCK_INTERRUPTIONS();
ht->nNumOfElements++;
- ZEND_HASH_IF_FULL_DO_RESIZE(ht); /* If the Hash table is
full, resize it */
+ ZEND_HASH_IF_FULL_DO_RESIZE(ht);
return SUCCESS;
}
@@ -357,11 +350,10 @@
if (flag & HASH_NEXT_INSERT) {
h = ht->nNextFreeElement;
}
- nIndex = h & ht->nTableMask;
+ nIndex = ZEND_HASH_BUCKET(ht, h);
- p = ht->arBuckets[nIndex];
- while (p != NULL) {
- if ((p->nKeyLength == 0) && (p->h == h)) {
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, p) {
+ if (EQ_INDEX(p, h)) {
if (flag & HASH_NEXT_INSERT || flag & HASH_ADD) {
return FAILURE;
}
@@ -386,7 +378,6 @@
}
return SUCCESS;
}
- p = p->pNext;
}
p = (Bucket *) pemalloc_rel(sizeof(Bucket) - 1, ht->persistent);
if (!p) {
@@ -402,7 +393,7 @@
CONNECT_TO_BUCKET_DLLIST(p, ht->arBuckets[nIndex]);
HANDLE_BLOCK_INTERRUPTIONS();
- ht->arBuckets[nIndex] = p;
+ ZEND_HASH_FILL_BUCKET(ht, nIndex, p);
CONNECT_TO_GLOBAL_DLLIST(p, ht);
HANDLE_UNBLOCK_INTERRUPTIONS();
@@ -414,27 +405,28 @@
return SUCCESS;
}
-
static int zend_hash_do_resize(HashTable *ht)
{
- Bucket **t;
+ void *t;
+ uint dest_size = ht->nTableSize <= ZEND_HASH_SMALL_TABLE_SIZE ?
(ht->nTableSize << 2) : (ht->nTableSize << 1);
IS_CONSISTENT(ht);
- if ((ht->nTableSize << 1) > 0) { /* Let's double the table size
*/
- t = (Bucket **) perealloc_recoverable(ht->arBuckets,
(ht->nTableSize << 1) * sizeof(Bucket *), ht->persistent);
- if (t) {
- HANDLE_BLOCK_INTERRUPTIONS();
- ht->arBuckets = t;
- ht->nTableSize = (ht->nTableSize << 1);
- ht->nTableMask = ht->nTableSize - 1;
- zend_hash_rehash(ht);
- HANDLE_UNBLOCK_INTERRUPTIONS();
- return SUCCESS;
+ if (dest_size > 0) { /* Let's change the table size */
+ t = perealloc(ht->arBuckets, dest_size * (sizeof(Bucket *) +
1), ht->persistent);
+ if (!t) {
+ return FAILURE;
}
- return FAILURE;
+ HANDLE_BLOCK_INTERRUPTIONS();
+ ht->nTableSize = dest_size;
+ ht->nTableMask = ht->nTableSize - 1;
+ ht->arBuckets = (Bucket**)t;
+ ht->arEmpty = ((uchar*)t) + (ht->nTableSize *
(sizeof(Bucket*)));
+ zend_hash_rehash(ht);
+ HANDLE_UNBLOCK_INTERRUPTIONS();
+ return SUCCESS;
}
- return SUCCESS;
+ return FAILURE;
}
ZEND_API int zend_hash_rehash(HashTable *ht)
@@ -444,17 +436,41 @@
IS_CONSISTENT(ht);
- memset(ht->arBuckets, 0, ht->nTableSize * sizeof(Bucket *));
+ memset(ht->arEmpty, 0, ht->nTableSize * 1);
+
p = ht->pListHead;
while (p != NULL) {
- nIndex = p->h & ht->nTableMask;
+ nIndex = ZEND_HASH_BUCKET(ht, p->h);
+ ZEND_HASH_FIND_FREE(ht, nIndex);
+ ZEND_HASH_FILL_BUCKET(ht, nIndex, p);
CONNECT_TO_BUCKET_DLLIST(p, ht->arBuckets[nIndex]);
- ht->arBuckets[nIndex] = p;
p = p->pListNext;
}
return SUCCESS;
}
+inline void zend_hash_free_bucket(HashTable *ht, uint nIndex) {
+ uint nIndexSlot = nIndex;
+ uint nSourceIndex;
+
+ ZEND_HASH_MAKE_ELEMENT_EMPTY(ht, nIndexSlot);
+ for(;;) {
+ ZEND_HASH_NEXT_INDEX(ht, nIndexSlot);
+ if (!ZEND_HASH_BUCKET_IS_OCCUPIED(ht, nIndexSlot)) {
+ break;
+ }
+ nSourceIndex = ZEND_HASH_BUCKET(ht,
ht->arBuckets[nIndexSlot]->h);
+ if ((nIndex < nIndexSlot) ?
+ ((nIndex < nSourceIndex) && (nSourceIndex <=
nIndexSlot)) :
+ ((nIndex < nSourceIndex) || (nSourceIndex <=
nIndexSlot))) {
+ continue;
+ }
+ ZEND_HASH_FILL_BUCKET(ht, nIndex, ht->arBuckets[nIndexSlot]);
+ nIndex = nIndexSlot;
+ ZEND_HASH_MAKE_ELEMENT_EMPTY(ht, nIndexSlot);
+ }
+}
+
ZEND_API int zend_hash_del_key_or_index(HashTable *ht, const char *arKey, uint
nKeyLength, ulong h, int flag)
{
uint nIndex;
@@ -465,27 +481,17 @@
if (flag == HASH_DEL_KEY) {
h = zend_inline_hash_func(arKey, nKeyLength);
}
- nIndex = h & ht->nTableMask;
+ nIndex = ZEND_HASH_BUCKET(ht, h);
- p = ht->arBuckets[nIndex];
- while (p != NULL) {
- if ((p->h == h)
- && (p->nKeyLength == nKeyLength)
- && ((p->nKeyLength == 0) /* Numeric index (short
circuits the memcmp() check) */
- || !memcmp(p->arKey, arKey, nKeyLength))) { /*
String index */
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, p) {
+ if (EQ_HASH_KEYS(p, arKey, h, nKeyLength)) {
HANDLE_BLOCK_INTERRUPTIONS();
- if (p == ht->arBuckets[nIndex]) {
- ht->arBuckets[nIndex] = p->pNext;
- } else {
- p->pLast->pNext = p->pNext;
- }
- if (p->pNext) {
- p->pNext->pLast = p->pLast;
- }
+
+ zend_hash_free_bucket(ht, nIndex);
+
if (p->pListLast != NULL) {
p->pListLast->pListNext = p->pListNext;
- } else {
- /* Deleting the head of the list */
+ } else {
ht->pListHead = p->pListNext;
}
if (p->pListNext != NULL) {
@@ -503,11 +509,11 @@
pefree(p->pData, ht->persistent);
}
pefree(p, ht->persistent);
+
HANDLE_UNBLOCK_INTERRUPTIONS();
ht->nNumOfElements--;
return SUCCESS;
}
- p = p->pNext;
}
return FAILURE;
}
@@ -559,7 +565,7 @@
}
pefree(q, ht->persistent);
}
- memset(ht->arBuckets, 0, ht->nTableSize*sizeof(Bucket *));
+ memset(ht->arEmpty, 0, ht->nTableSize*1);
ht->pListHead = NULL;
ht->pListTail = NULL;
ht->nNumOfElements = 0;
@@ -571,31 +577,31 @@
/* This function is used by the various apply() functions.
* It deletes the passed bucket, and returns the address of the
- * next bucket. The hash *may* be altered during that time, the
+ * next bucket. The hash *may* be altered during that time, the
* returned value will still be valid.
*/
static Bucket *zend_hash_apply_deleter(HashTable *ht, Bucket *p)
{
Bucket *retval;
- HANDLE_BLOCK_INTERRUPTIONS();
- if (p->pLast) {
- p->pLast->pNext = p->pNext;
- } else {
- uint nIndex;
+ uint nIndex;
- nIndex = p->h & ht->nTableMask;
- ht->arBuckets[nIndex] = p->pNext;
- }
- if (p->pNext) {
- p->pNext->pLast = p->pLast;
- } else {
- /* Nothing to do as this list doesn't have a tail */
+ nIndex = ZEND_HASH_BUCKET(ht, p->h);
+ for (; ht->arBuckets[nIndex] != p; ZEND_HASH_NEXT_INDEX(ht, nIndex)) {
+#ifdef ZEND_DEBUG
+ if (!ZEND_HASH_BUCKET_IS_OCCUPIED(ht, nIndex)) {
+ zend_output_debug_string(1, "No element to delete");
+ }
+#endif
}
+ HANDLE_BLOCK_INTERRUPTIONS();
+
+ zend_hash_free_bucket(ht, nIndex);
+
if (p->pListLast != NULL) {
p->pListLast->pListNext = p->pListNext;
- } else {
+ } else {
/* Deleting the head of the list */
ht->pListHead = p->pListNext;
}
@@ -655,9 +661,9 @@
SET_INCONSISTENT(HT_DESTROYED);
}
-/* This is used to recurse elements and selectively delete certain entries
- * from a hashtable. apply_func() receives the data and decides if the entry
- * should be deleted or recursion should be stopped. The following three
+/* This is used to recurse elements and selectively delete certain entries
+ * from a hashtable. apply_func() receives the data and decides if the entry
+ * should be deleted or recursion should be stopped. The following three
* return codes are possible:
* ZEND_HASH_APPLY_KEEP - continue
* ZEND_HASH_APPLY_STOP - stop iteration
@@ -674,7 +680,7 @@
p = ht->pListHead;
while (p != NULL) {
int result = apply_func(p->pData TSRMLS_CC);
-
+
if (result & ZEND_HASH_APPLY_REMOVE) {
p = zend_hash_apply_deleter(ht, p);
} else {
@@ -698,7 +704,7 @@
p = ht->pListHead;
while (p != NULL) {
int result = apply_func(p->pData, argument TSRMLS_CC);
-
+
if (result & ZEND_HASH_APPLY_REMOVE) {
p = zend_hash_apply_deleter(ht, p);
} else {
@@ -878,17 +884,12 @@
IS_CONSISTENT(ht);
h = zend_inline_hash_func(arKey, nKeyLength);
- nIndex = h & ht->nTableMask;
-
- p = ht->arBuckets[nIndex];
- while (p != NULL) {
- if ((p->h == h) && (p->nKeyLength == nKeyLength)) {
- if (!memcmp(p->arKey, arKey, nKeyLength)) {
- *pData = p->pData;
- return SUCCESS;
- }
+ nIndex = ZEND_HASH_BUCKET(ht, h);
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, p) {
+ if (EQ_HASH_KEYS(p, arKey, h, nKeyLength)) {
+ *pData = p->pData;
+ return SUCCESS;
}
- p = p->pNext;
}
return FAILURE;
}
@@ -905,17 +906,13 @@
IS_CONSISTENT(ht);
- nIndex = h & ht->nTableMask;
+ nIndex = ZEND_HASH_BUCKET(ht, h);
- p = ht->arBuckets[nIndex];
- while (p != NULL) {
- if ((p->h == h) && (p->nKeyLength == nKeyLength)) {
- if (!memcmp(p->arKey, arKey, nKeyLength)) {
- *pData = p->pData;
- return SUCCESS;
- }
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, p) {
+ if (EQ_HASH_KEYS(p, arKey, h, nKeyLength)) {
+ *pData = p->pData;
+ return SUCCESS;
}
- p = p->pNext;
}
return FAILURE;
}
@@ -930,16 +927,12 @@
IS_CONSISTENT(ht);
h = zend_inline_hash_func(arKey, nKeyLength);
- nIndex = h & ht->nTableMask;
+ nIndex = ZEND_HASH_BUCKET(ht, h);
- p = ht->arBuckets[nIndex];
- while (p != NULL) {
- if ((p->h == h) && (p->nKeyLength == nKeyLength)) {
- if (!memcmp(p->arKey, arKey, nKeyLength)) {
- return 1;
- }
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, p) {
+ if (EQ_HASH_KEYS(p, arKey, h, nKeyLength)) {
+ return 1;
}
- p = p->pNext;
}
return 0;
}
@@ -956,16 +949,12 @@
IS_CONSISTENT(ht);
- nIndex = h & ht->nTableMask;
+ nIndex = ZEND_HASH_BUCKET(ht, h);
- p = ht->arBuckets[nIndex];
- while (p != NULL) {
- if ((p->h == h) && (p->nKeyLength == nKeyLength)) {
- if (!memcmp(p->arKey, arKey, nKeyLength)) {
- return 1;
- }
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, p) {
+ if (EQ_HASH_KEYS(p, arKey, h, nKeyLength)) {
+ return 1;
}
- p = p->pNext;
}
return 0;
@@ -979,15 +968,13 @@
IS_CONSISTENT(ht);
- nIndex = h & ht->nTableMask;
+ nIndex = ZEND_HASH_BUCKET(ht, h);
- p = ht->arBuckets[nIndex];
- while (p != NULL) {
- if ((p->h == h) && (p->nKeyLength == 0)) {
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, p) {
+ if (EQ_INDEX(p, h)) {
*pData = p->pData;
return SUCCESS;
}
- p = p->pNext;
}
return FAILURE;
}
@@ -1000,14 +987,12 @@
IS_CONSISTENT(ht);
- nIndex = h & ht->nTableMask;
+ nIndex = ZEND_HASH_BUCKET(ht, h);
- p = ht->arBuckets[nIndex];
- while (p != NULL) {
- if ((p->h == h) && (p->nKeyLength == 0)) {
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, p) {
+ if (EQ_INDEX(p, h)) {
return 1;
}
- p = p->pNext;
}
return 0;
}
@@ -1041,13 +1026,12 @@
Bucket *p;
IS_CONSISTENT(ht);
- p = ht->arBuckets[ptr->h & ht->nTableMask];
- while (p != NULL) {
+ uint nIndex = ZEND_HASH_BUCKET(ht, ptr->h);
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, p) {
if (p == ptr->pos) {
ht->pInternalPointer = p;
return 1;
}
- p = p->pNext;
}
return 0;
}
@@ -1065,7 +1049,7 @@
}
-/* This function will be extremely optimized by remembering
+/* This function will be extremely optimized by remembering
* the end of the list
*/
ZEND_API void zend_hash_internal_pointer_end_ex(HashTable *ht, HashPosition
*pos)
@@ -1106,7 +1090,7 @@
}
-/* This function should be made binary safe */
+/* This function should be made binary safe */
ZEND_API int zend_hash_get_current_key_ex(const HashTable *ht, char
**str_index, uint *str_length, ulong *num_index, zend_bool duplicate,
HashPosition *pos)
{
Bucket *p;
@@ -1176,6 +1160,8 @@
ZEND_API int zend_hash_update_current_key_ex(HashTable *ht, int key_type,
const char *str_index, uint str_length, ulong num_index, int mode, HashPosition
*pos)
{
Bucket *p;
+ Bucket *q;
+ uint nIndex;
p = pos ? (*pos) : ht->pInternalPointer;
@@ -1184,18 +1170,18 @@
if (p) {
if (key_type == HASH_KEY_IS_LONG) {
str_length = 0;
- if (!p->nKeyLength && p->h == num_index) {
+ if (EQ_INDEX(p, num_index)) {
return SUCCESS;
}
if (mode != HASH_UPDATE_KEY_ANYWAY) {
- Bucket *q = ht->arBuckets[num_index &
ht->nTableMask];
+ nIndex = ZEND_HASH_BUCKET(ht, num_index);
int found = 0;
- while (q != NULL) {
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, q) {
if (q == p) {
found = 1;
- } else if (!q->nKeyLength && q->h ==
num_index) {
+ } else if (EQ_INDEX(q, num_index)) {
if (found) {
if (mode &
HASH_UPDATE_KEY_IF_BEFORE) {
break;
@@ -1220,28 +1206,26 @@
}
}
}
- q = q->pNext;
}
}
zend_hash_index_del(ht, num_index);
} else if (key_type == HASH_KEY_IS_STRING) {
if (p->nKeyLength == str_length &&
- memcmp(p->arKey, str_index, str_length) == 0) {
+ memcmp(p->arKey, str_index, str_length)
== 0) {
return SUCCESS;
}
if (mode != HASH_UPDATE_KEY_ANYWAY) {
ulong h = zend_inline_hash_func(str_index,
str_length);
- Bucket *q = ht->arBuckets[h & ht->nTableMask];
+ nIndex = ZEND_HASH_BUCKET(ht, h);
int found = 0;
- while (q != NULL) {
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, q) {
if (q == p) {
found = 1;
- } else if (q->h == h && q->nKeyLength
== str_length &&
- memcmp(q->arKey, str_index,
str_length) == 0) {
- if (found) {
+ } else if (EQ_HASH_KEYS(q, str_index,
h, str_length)) {
+ if (found) {
if (mode &
HASH_UPDATE_KEY_IF_BEFORE) {
break;
} else {
@@ -1265,7 +1249,6 @@
}
}
}
- q = q->pNext;
}
}
@@ -1276,13 +1259,12 @@
HANDLE_BLOCK_INTERRUPTIONS();
- if (p->pNext) {
- p->pNext->pLast = p->pLast;
- }
- if (p->pLast) {
- p->pLast->pNext = p->pNext;
- } else{
- ht->arBuckets[p->h & ht->nTableMask] = p->pNext;
+ nIndex = ZEND_HASH_BUCKET(ht, p->h);
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, q) {
+ if (p == q) {
+ zend_hash_free_bucket(ht, nIndex);
+ break;
+ }
}
if (p->nKeyLength != str_length) {
@@ -1324,8 +1306,10 @@
p->h = zend_inline_hash_func(str_index, str_length);
}
- CONNECT_TO_BUCKET_DLLIST(p, ht->arBuckets[p->h &
ht->nTableMask]);
- ht->arBuckets[p->h & ht->nTableMask] = p;
+ nIndex = ZEND_HASH_BUCKET(ht, p->h);
+ CONNECT_TO_BUCKET_DLLIST(p, ht->arBuckets[nIndex]);
+ ZEND_HASH_FIND_FREE(ht, nIndex);
+ ZEND_HASH_FILL_BUCKET(ht, nIndex, p);
HANDLE_UNBLOCK_INTERRUPTIONS();
return SUCCESS;
@@ -1406,13 +1390,13 @@
IS_CONSISTENT(ht1);
IS_CONSISTENT(ht2);
- HASH_PROTECT_RECURSION(ht1);
- HASH_PROTECT_RECURSION(ht2);
+ HASH_PROTECT_RECURSION(ht1);
+ HASH_PROTECT_RECURSION(ht2);
result = ht1->nNumOfElements - ht2->nNumOfElements;
if (result!=0) {
- HASH_UNPROTECT_RECURSION(ht1);
- HASH_UNPROTECT_RECURSION(ht2);
+ HASH_UNPROTECT_RECURSION(ht1);
+ HASH_UNPROTECT_RECURSION(ht2);
return result;
}
@@ -1423,29 +1407,29 @@
while (p1) {
if (ordered && !p2) {
- HASH_UNPROTECT_RECURSION(ht1);
- HASH_UNPROTECT_RECURSION(ht2);
+ HASH_UNPROTECT_RECURSION(ht1);
+ HASH_UNPROTECT_RECURSION(ht2);
return 1; /* That's not supposed to happen */
}
if (ordered) {
if (p1->nKeyLength==0 && p2->nKeyLength==0) { /*
numeric indices */
result = p1->h - p2->h;
if (result!=0) {
- HASH_UNPROTECT_RECURSION(ht1);
- HASH_UNPROTECT_RECURSION(ht2);
+ HASH_UNPROTECT_RECURSION(ht1);
+ HASH_UNPROTECT_RECURSION(ht2);
return result;
}
} else { /* string indices */
result = p1->nKeyLength - p2->nKeyLength;
if (result!=0) {
- HASH_UNPROTECT_RECURSION(ht1);
- HASH_UNPROTECT_RECURSION(ht2);
+ HASH_UNPROTECT_RECURSION(ht1);
+ HASH_UNPROTECT_RECURSION(ht2);
return result;
}
result = memcmp(p1->arKey, p2->arKey,
p1->nKeyLength);
if (result!=0) {
- HASH_UNPROTECT_RECURSION(ht1);
- HASH_UNPROTECT_RECURSION(ht2);
+ HASH_UNPROTECT_RECURSION(ht1);
+ HASH_UNPROTECT_RECURSION(ht2);
return result;
}
}
@@ -1453,22 +1437,22 @@
} else {
if (p1->nKeyLength==0) { /* numeric index */
if (zend_hash_index_find(ht2, p1->h,
&pData2)==FAILURE) {
- HASH_UNPROTECT_RECURSION(ht1);
- HASH_UNPROTECT_RECURSION(ht2);
+ HASH_UNPROTECT_RECURSION(ht1);
+ HASH_UNPROTECT_RECURSION(ht2);
return 1;
}
} else { /* string index */
if (zend_hash_quick_find(ht2, p1->arKey,
p1->nKeyLength, p1->h, &pData2)==FAILURE) {
- HASH_UNPROTECT_RECURSION(ht1);
- HASH_UNPROTECT_RECURSION(ht2);
+ HASH_UNPROTECT_RECURSION(ht1);
+ HASH_UNPROTECT_RECURSION(ht2);
return 1;
}
}
}
result = compar(p1->pData, pData2 TSRMLS_CC);
if (result!=0) {
- HASH_UNPROTECT_RECURSION(ht1);
- HASH_UNPROTECT_RECURSION(ht2);
+ HASH_UNPROTECT_RECURSION(ht1);
+ HASH_UNPROTECT_RECURSION(ht2);
return result;
}
p1 = p1->pListNext;
@@ -1476,9 +1460,9 @@
p2 = p2->pListNext;
}
}
-
- HASH_UNPROTECT_RECURSION(ht1);
- HASH_UNPROTECT_RECURSION(ht2);
+
+ HASH_UNPROTECT_RECURSION(ht1);
+ HASH_UNPROTECT_RECURSION(ht2);
return 0;
}
@@ -1538,9 +1522,8 @@
for (i = 0; i < ht->nTableSize; i++) {
p = ht->arBuckets[i];
- while (p != NULL) {
+ if (ZEND_HASH_BUCKET_IS_OCCUPIED(ht, i)) {
zend_output_debug_string(0, "%s <==> 0x%lX\n",
p->arKey, p->h);
- p = p->pNext;
}
}
Index: Zend/zend_hash.h
===================================================================
--- Zend/zend_hash.h 2010-12-30 17:09:14.000000000 +0100
+++ Zend/zend_hash.h 2010-12-31 01:03:46.000000000 +0100
@@ -48,18 +48,17 @@
typedef void (*dtor_func_t)(void *pDest);
typedef void (*copy_ctor_func_t)(void *pElement);
typedef void (*copy_ctor_param_func_t)(void *pElement, void *pParam);
+typedef unsigned char uchar;
struct _hashtable;
typedef struct bucket {
- ulong h; /* Used for
numeric indexing */
+ ulong h;
uint nKeyLength;
void *pData;
void *pDataPtr;
struct bucket *pListNext;
struct bucket *pListLast;
- struct bucket *pNext;
- struct bucket *pLast;
char arKey[1]; /* Must be last element */
} Bucket;
@@ -72,6 +71,7 @@
Bucket *pListHead;
Bucket *pListTail;
Bucket **arBuckets;
+ uchar *arEmpty;
dtor_func_t pDestructor;
zend_bool persistent;
unsigned char nApplyCount;
@@ -124,6 +124,31 @@
ZEND_API int zend_hash_add_empty_element(HashTable *ht, const char *arKey,
uint nKeyLength);
+#define ZEND_HASH_BUCKET(ht, index) \
+ (((index) + ((index)<<4)) & (ht)->nTableMask)
+
+#define EQ_KEYS(a,b,n) (!memcmp((a), (b), (n)))
+#define EQ_HASH_KEYS(ptr,key,hash,len) (EXPECTED(((ptr)->h == (hash)) && \
+ ((ptr)->nKeyLength == (len)) && \
+ ((len) == 0 || EQ_KEYS((ptr)->arKey,(key),(len)))))
+
+#define ZEND_HASH_NEXT_INDEX(ht, index) \
+ (index) = (((index)+1) & (ht)->nTableMask)
+#define ZEND_HASH_BUCKET_IS_OCCUPIED(ht, index) \
+ ((ht)->arEmpty[(index)] != 0)
+// ((ht)->arBuckets[(index)] != NULL)
+#define ZEND_HASH_GET_BUCKET_AND_CHECK_OCCUPIED(ht, index, bucket) \
+ (bucket) = (ht)->arBuckets[(index)], ZEND_HASH_BUCKET_IS_OCCUPIED((ht),
(index))
+#define ZEND_HASH_ITERATE_UNTIL_FREE(ht, index, bucket) \
+ for (; ZEND_HASH_GET_BUCKET_AND_CHECK_OCCUPIED((ht), (index),
(bucket)); ZEND_HASH_NEXT_INDEX((ht), (index)))
+#define ZEND_HASH_FIND_FREE(ht, index) \
+ for (; ZEND_HASH_BUCKET_IS_OCCUPIED((ht), (index));
ZEND_HASH_NEXT_INDEX((ht), (index)))
+#define ZEND_HASH_FILL_BUCKET(ht, index, element) \
+ (ht)->arBuckets[(index)] = element; \
+ (ht)->arEmpty[(index)] = 1;
+#define ZEND_HASH_MAKE_ELEMENT_EMPTY(ht, index) \
+ (ht)->arEmpty[(index)] = 0;
+// (ht)->arBuckets[(index)] = 0;
#define ZEND_HASH_APPLY_KEEP 0
#define ZEND_HASH_APPLY_REMOVE 1<<0
@@ -225,54 +250,12 @@
ZEND_API int zend_hash_rehash(HashTable *ht);
-/*
- * DJBX33A (Daniel J. Bernstein, Times 33 with Addition)
- *
- * This is Daniel J. Bernstein's popular `times 33' hash function as
- * posted by him years ago on comp.lang.c. It basically uses a function
- * like ``hash(i) = hash(i-1) * 33 + str[i]''. This is one of the best
- * known hash functions for strings. Because it is both computed very
- * fast and distributes very well.
- *
- * The magic of number 33, i.e. why it works better than many other
- * constants, prime or not, has never been adequately explained by
- * anyone. So I try an explanation: if one experimentally tests all
- * multipliers between 1 and 256 (as RSE did now) one detects that even
- * numbers are not useable at all. The remaining 128 odd numbers
- * (except for the number 1) work more or less all equally well. They
- * all distribute in an acceptable way and this way fill a hash table
- * with an average percent of approx. 86%.
- *
- * If one compares the Chi^2 values of the variants, the number 33 not
- * even has the best value. But the number 33 and a few other equally
- * good numbers like 17, 31, 63, 127 and 129 have nevertheless a great
- * advantage to the remaining numbers in the large set of possible
- * multipliers: their multiply operation can be replaced by a faster
- * operation based on just one shift plus either a single addition
- * or subtraction operation. And because a hash function has to both
- * distribute good _and_ has to be very fast to compute, those few
- * numbers should be preferred and seems to be the reason why Daniel J.
- * Bernstein also preferred it.
- *
- *
- * -- Ralf S. Engelschall <r...@engelschall.com>
- */
-
static inline ulong zend_inline_hash_func(const char *arKey, uint nKeyLength)
{
- register ulong hash = 5381;
+ ulong hash = 5381;
+ const ulong* ar = (ulong*)arKey;
- /* variant with the hash unrolled eight times */
- for (; nKeyLength >= 8; nKeyLength -= 8) {
- hash = ((hash << 5) + hash) + *arKey++;
- hash = ((hash << 5) + hash) + *arKey++;
- hash = ((hash << 5) + hash) + *arKey++;
- hash = ((hash << 5) + hash) + *arKey++;
- hash = ((hash << 5) + hash) + *arKey++;
- hash = ((hash << 5) + hash) + *arKey++;
- hash = ((hash << 5) + hash) + *arKey++;
- hash = ((hash << 5) + hash) + *arKey++;
- }
+ if (nKeyLength < sizeof(ulong)) {
switch (nKeyLength) {
case 7: hash = ((hash << 5) + hash) + *arKey++; /*
fallthrough... */
case 6: hash = ((hash << 5) + hash) + *arKey++; /*
fallthrough... */
@@ -284,10 +267,17 @@
case 0: break;
EMPTY_SWITCH_DEFAULT_CASE()
}
- return hash;
+ } else {
+ const ulong* arEnd = (ulong*)(arKey + nKeyLength -
sizeof(ulong));
+// memcpy(&hash, arKey, sizeof(ulong)); // hash = *arEnd, but this
could be from unaligned address
+ hash = *arEnd;
+ for (; ar < arEnd; ar++) {
+ hash = ((hash << 5) + hash) + *ar;
+ }
+ }
+ return (hash<<16)|(hash%65165);
}
-
ZEND_API ulong zend_hash_func(const char *arKey, uint nKeyLength);
#if ZEND_DEBUG
Index: ext/spl/spl_array.c
===================================================================
--- ext/spl/spl_array.c 2010-12-30 17:07:19.000000000 +0100
+++ ext/spl/spl_array.c 2010-12-30 17:11:02.000000000 +0100
@@ -115,12 +115,11 @@
/* IS_CONSISTENT(ht);*/
/* HASH_PROTECT_RECURSION(ht);*/
- p = ht->arBuckets[intern->pos_h & ht->nTableMask];
- while (p != NULL) {
+ uint nIndex = ZEND_HASH_BUCKET(ht, intern->pos_h);
+ ZEND_HASH_ITERATE_UNTIL_FREE(ht, nIndex, p) {
if (p == intern->pos) {
return SUCCESS;
}
- p = p->pNext;
}
/* HASH_UNPROTECT_RECURSION(ht); */
spl_array_rewind(intern TSRMLS_CC);
Index: apc_bin.c
===================================================================
--- apc_bin.c 2010-11-30 11:18:31.000000000 +0100
+++ apc_bin.c 2010-12-30 17:11:03.000000000 +0100
@@ -412,12 +412,6 @@
if((*bp_prev)->pListLast) {
apc_swizzle_ptr(bd, ll, &(*bp_prev)->pListLast);
}
- if((*bp_prev)->pNext) {
- apc_swizzle_ptr(bd, ll, &(*bp_prev)->pNext);
- }
- if((*bp_prev)->pLast) {
- apc_swizzle_ptr(bd, ll, &(*bp_prev)->pLast);
- }
apc_swizzle_ptr(bd, ll, bp_prev);
}
for(i=0; i < ht->nTableSize; i++) {
Index: apc_compile.c
===================================================================
--- apc_compile.c 2010-11-30 11:18:31.000000000 +0100
+++ apc_compile.c 2010-12-30 17:12:28.000000000 +0100
@@ -800,6 +800,7 @@
Bucket* newp = NULL;
int first = 1;
apc_pool* pool = ctxt->pool;
+ void* t;
assert(src != NULL);
@@ -810,14 +811,17 @@
memcpy(dst, src, sizeof(src[0]));
/* allocate buckets for the new hashtable */
- CHECK((dst->arBuckets = apc_pool_alloc(pool, (dst->nTableSize *
sizeof(Bucket*)))));
+ CHECK((t = apc_pool_alloc(pool, (dst->nTableSize * (sizeof(Bucket*) +
1)))));
- memset(dst->arBuckets, 0, dst->nTableSize * sizeof(Bucket*));
+ memset(t, 0, dst->nTableSize * (sizeof(Bucket*) + 1));
+
+ dst->arBuckets = (Bucket**)t;
+ dst->arEmpty = ((uchar*)t) + (src->nTableSize * (sizeof(Bucket*)));
dst->pInternalPointer = NULL;
dst->pListHead = NULL;
for (curr = src->pListHead; curr != NULL; curr = curr->pListNext) {
- int n = curr->h % dst->nTableSize;
+ int n = ZEND_HASH_BUCKET(dst, curr->h);
if(check_fn) {
va_list args;
@@ -863,16 +867,8 @@
#endif
/* insert 'newp' into the linked list at its hashed index */
- if (dst->arBuckets[n]) {
- newp->pNext = dst->arBuckets[n];
- newp->pLast = NULL;
- newp->pNext->pLast = newp;
- }
- else {
- newp->pNext = newp->pLast = NULL;
- }
-
- dst->arBuckets[n] = newp;
+ ZEND_HASH_FIND_FREE(dst, n);
+ ZEND_HASH_FILL_BUCKET(dst, n, newp);
/* copy the bucket data using our 'copy_fn' callback function */
CHECK((newp->pData = copy_fn(NULL, curr->pData, ctxt TSRMLS_CC)));
@@ -1917,11 +1913,9 @@
uint i;
for (i = 0; i < ht->nTableSize; i++) {
- if(!ht->arBuckets) break;
- p = ht->arBuckets[i];
- while (p != NULL) {
+ if (ZEND_HASH_BUCKET_IS_OCCUPIED(ht, i)) {
+ p = ht->arBuckets[i];
fixup(p, src, dst);
- p = p->pNext;
}
}
}
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php