Hi, Just ran some tbench numbers (from dbench-3.04), on a 2 socket, 8 core x86 system, with 1 NUMA node per socket. With kernel 2.6.24-rc2, comparing slab vs slub allocators.
I run from 1 to 16 client threads, 5 times each, and restarting the tbench server between every run. I'm just taking the highest of each of the 5 tests (because the scheduler placement can sometimes be poor). It's not completely scientific, but from the graph you can guess it is relatively stable and seems significant. Summary: slub is consistently slower. When all CPUs are saturated, it is around 20% slower. Attached is a graph (x is nrclients, y is throughput MB/s) If I can help with reproducing it or testing anything, let me know. I'll be trying out a few other benchmarks too... anything you want me to test specifically and I can try. Thanks, Nick
<<attachment: slab.png>>