Hi Mitchell,

I do work for Sun, but I don't consider myself biased towards the slab 
allocator or any other Solaris or Sun code. I know we've got plenty of 
improvements to make!

That said, your example is not multi-threaded. There are two major performance 
issues which come up with a list structure in a multi-threaded environment. The 
slab allocator addresses both of these.

The first is locking. In a multi-threaded environment, your 
node_alloc/node_free would have to lock the free list head. This leads to 
contention between threads. (Yes, I know about lockfree algorithms, but they 
don't eliminate contention; and they are even more susceptible to the second 
problem.)

The second is cache line movement. Each time a CPU reads the memory location 
containing the head of the free list, it needs to get this data from memory or 
from another CPU's cache. Each time the CPU updates memory (which happens for 
each allocation or free), the cache line must be invalidated in all other CPU's 
caches. Hence ownership of the cache line moves between CPUs. This is 
incredibly expensive (it’s a major reason that locking is slow, actually). It 
can take tens to hundreds of cycles.

How does the slab allocator avoid these? By taking advantage of the operating 
system's providing a “current CPU” index to the thread. The allocator uses this 
to index into a set of cached objects which have been recently freed by the 
current CPU, and allocates/frees into that set. This has the advantages that 
(a) the lock protecting the set is very rarely contended, since a thread would 
have to be pre-empted during the run of the allocator to have its current CPU 
change; (b) the data structures describing the set are not shared between CPUs 
and hence can stay in one cache; (c) the objects themselves tend to have been 
freed from the current CPU and hence are somewhat more likely to be in the 
current CPU's cache. (I'm not sure if that last has a measurable effect.)

If you’re interested in learning more, I’d suggest reading the original paper, 
which you can find at:

  http://citeseer.ist.psu.edu/bonwick94slab.html

The slab allocator does somewhat more, such as allowing objects of a type which 
require initialization to be reused more cheaply than an 
allocation+initialization, but the above are the key scalability insights as I 
understand them.

Your approach will certainly perform better on a single processor, and might on 
a dual processor. Beyond that, it's scalability is likely poor. And, as Casper 
pointed out, keeping private memory pools for each subsystem tends to increase 
kernel memory usage .
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to