bert hubert wrote:
Dear FreeBSD hackers,
I'm working on improving the PowerDNS recursor for a big FreeBSD-loving
internet provider in The Netherlands and I am hitting some snags. I also
hope this is the appropriate list to share my concerns.
Pruning the cache is very very slow on the providers FreeBSD 6.0 x86 systems
whereas it flies on other operating systems.
I've managed to boil down the problem to the code found on
http://ds9a.nl/tmp/cache-test.cc which can be compiled with:
'g++ -O3 -I/usr/local/include cache-test.cc -o cache-test' after installing
Boost from the ports.
The problem exists both with the system compiler and with a self-compiled
g++ 4.1.
Here are some typical timings:
$ ./cache-test
Creating..
Copying 499950 nodes
100 345 usec 3.45 usec/erase
300 3298 usec 10.99 usec/erase
500 8749 usec 17.50 usec/erase
700 72702 usec 103.86 usec/erase
900 46521 usec 51.69 usec/erase
On another operating system with almost the same cpu:
$ ./cache-test
Creating..
Copying 499950 nodes
100 62 usec 0.62 usec/erase
300 187 usec 0.62 usec/erase
500 347 usec 0.69 usec/erase
700 419 usec 0.60 usec/erase
900 575 usec 0.64 usec/erase
I've toyed with MALLOC_OPTIONS, especially the >> options, I've tried
GLIBCXX_FORCE_NEW, I've tried specifying a different STL allocator in the
c++ code, it all doesn't change a thing.
A quick gprof profile shows a tremendous number of calls to 'ifree' but that
may be due to the copying of the container going on between test runs.
Any help would be very appreciated as I am all out of clues.
Thanks.
I ran cache-test on -current using phkmalloc and a couple of different
versions of jemalloc. jemalloc does not appear to have the same issue
for this test. It isn't obvious to me why phkmalloc is performing so
poorly, but I think you can assume that this is a malloc performance
problem.
The following jemalloc results were run with NO_MALLOC_EXTRAS defined.
I included the patch results because I expect to commit the patch this week.
phkmalloc and jemalloc have similar memory usage, but jemalloc is
substantially faster. The jemalloc patch uses substantially less memory
than either phkmalloc or jemalloc.
Jason
------- phkmalloc: -----------------------------------------------------
onyx:~> MALLOC_OPTIONS=aj LD_PRELOAD=/tmp/phkmalloc/libc/libc.so.6 =time
-l ./cache-test
Creating..
Copying 499950 nodes
100 501 usec 5.01 usec/erase
300 53183 usec 177.28 usec/erase
500 5491 usec 10.98 usec/erase
700 158989 usec 227.13 usec/erase
900 47491 usec 52.77 usec/erase
1100 324948 usec 295.41 usec/erase
1300 106480 usec 81.91 usec/erase
1500 522414 usec 348.28 usec/erase
1700 155604 usec 91.53 usec/erase
1900 685235 usec 360.65 usec/erase
2100 230939 usec 109.97 usec/erase
2300 860083 usec 373.95 usec/erase
2500 234910 usec 93.96 usec/erase
2700 1226310 usec 454.19 usec/erase
2900 205739 usec 70.94 usec/erase
3100 1379395 usec 444.97 usec/erase
3300 296925 usec 89.98 usec/erase
3500 1620705 usec 463.06 usec/erase
3700 312343 usec 84.42 usec/erase
3900 1835125 usec 470.54 usec/erase
4100 306443 usec 74.74 usec/erase
4300 1805999 usec 420.00 usec/erase
4500 323179 usec 71.82 usec/erase
4700 1593007 usec 338.94 usec/erase
4900 316249 usec 64.54 usec/erase
495.53 real 494.29 user 1.17 sys
279240 maximum resident set size
60 average shared memory size
274524 average unshared data size
128 average unshared stack size
78238 page reclaims
1 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
4 voluntary context switches
6492 involuntary context switches
------- jemalloc (-current): -------------------------------------------
onyx:~> MALLOC_OPTIONS=aj LD_PRELOAD=/tmp/jemalloc/libc/libc.so.6 =time
-l ./cache-test
Creating..
Copying 499950 nodes
100 281 usec 2.81 usec/erase
300 586 usec 1.95 usec/erase
500 1008 usec 2.02 usec/erase
700 973 usec 1.39 usec/erase
900 1489 usec 1.65 usec/erase
1100 2269 usec 2.06 usec/erase
1300 2493 usec 1.92 usec/erase
1500 3337 usec 2.22 usec/erase
1700 3815 usec 2.24 usec/erase
1900 3511 usec 1.85 usec/erase
2100 4493 usec 2.14 usec/erase
2300 4235 usec 1.84 usec/erase
2500 6043 usec 2.42 usec/erase
2700 5474 usec 2.03 usec/erase
2900 7670 usec 2.64 usec/erase
3100 6104 usec 1.97 usec/erase
3300 10923 usec 3.31 usec/erase
3500 4560 usec 1.30 usec/erase
3700 9998 usec 2.70 usec/erase
3900 8023 usec 2.06 usec/erase
4100 15031 usec 3.67 usec/erase
4300 5588 usec 1.30 usec/erase
4500 15490 usec 3.44 usec/erase
4700 6544 usec 1.39 usec/erase
4900 14565 usec 2.97 usec/erase
38.58 real 37.98 user 0.57 sys
275752 maximum resident set size
60 average shared memory size
12 average unshared data size
128 average unshared stack size
68494 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
1 voluntary context switches
1180 involuntary context switches
------- jemalloc (patch): ----------------------------------------------
(http://people.freebsd.org/~jasone/jemalloc/patches/jemalloc_20060315a.diff)
onyx:~> MALLOC_OPTIONS=aj LD_PRELOAD=/usr/obj/usr/src/lib/libc/libc.so.6
=time -l ./cache-test
Creating..
Copying 499950 nodes
100 232 usec 2.32 usec/erase
300 912 usec 3.04 usec/erase
500 2514 usec 5.03 usec/erase
700 2008 usec 2.87 usec/erase
900 3255 usec 3.62 usec/erase
1100 2931 usec 2.66 usec/erase
1300 4010 usec 3.08 usec/erase
1500 3486 usec 2.32 usec/erase
1700 4675 usec 2.75 usec/erase
1900 2992 usec 1.57 usec/erase
2100 2417 usec 1.15 usec/erase
2300 4986 usec 2.17 usec/erase
2500 4000 usec 1.60 usec/erase
2700 5990 usec 2.22 usec/erase
2900 3661 usec 1.26 usec/erase
3100 4702 usec 1.52 usec/erase
3300 5934 usec 1.80 usec/erase
3500 7999 usec 2.29 usec/erase
3700 5998 usec 1.62 usec/erase
3900 6489 usec 1.66 usec/erase
4100 6997 usec 1.71 usec/erase
4300 7965 usec 1.85 usec/erase
4500 7849 usec 1.74 usec/erase
4700 8456 usec 1.80 usec/erase
4900 7814 usec 1.59 usec/erase
37.13 real 35.86 user 1.22 sys
222976 maximum resident set size
59 average shared memory size
11 average unshared data size
127 average unshared stack size
104136 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
2 voluntary context switches
1162 involuntary context switches
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"