On Sun, 19 Sep 2010, Andriy Gapon wrote:
on 19/09/2010 01:16 Jeff Roberson said the following:
Not specifically in reaction to Robert's comment but I would like to add my
thoughts to this notion of resource balancing in buckets. I really prefer not
to do any specific per-zone tuning except in extreme cases. This is because
quite often the decisions we make don't apply to some class of machines or
workloads. I would instead prefer to keep the algorithm adaptable.
Agree.
I like the idea of weighting the bucket decisions by the size of the item.
Obviously this has some flaws with compound objects but in the general case it
is good. We should consider increasing the cost of bucket expansion based on
the size of the item. Right now buckets are expanded fairly readily.
We could also consider decreasing the default bucket size for a zone based on vm
pressure and use. Right now there is no downward pressure on bucket size, only
upward based on trips to the slab layer.
Additionally we could make a last ditch flush mechanism that runs on each cpu in
turn and flushes some or all of the buckets in per-cpu caches. Presently that is
not done due to synchronization issues. It can't be done from a central place.
It could be done with a callout mechanism or a for loop that binds to each core
in succession.
I like all of the tree above approaches.
The last one is a bit hard to implement, the first two seem easier.
All the last one requires is a loop calling sched_bind() on each available
cpu.
I believe the combination of these approaches would significantly solve the
problem and should be relatively little new code. It should also preserve the
adaptable nature of the system without penalizing resource heavy systems. I
would be happy to review patches from anyone who wishes to undertake it.
FWIW, the approach of simply limiting maximum bucket size based on item size
seems to work rather well too, as my testing with zfs+uma shows.
I will also try to add code to completely bypass the per-cpu cache for "really
huge" items.
I don't like this because even with very large buffers you can still have
high enough turnover to require per-cpu caching. Kip specifically added
UMA support to address this issue in zfs. If you have allocations which
don't require per-cpu caching and are very large why even use UMA?
One thing that would be nice if we are frequently using page size
allocations is to eliminate the requirement for a slab header for each
page. It seems unnecessary for any zone where the items per slab is 1 but
it would require careful modification to support properly.
Thanks,
Jeff
--
Andriy Gapon
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"