On Sun, 2 Mar 2008, Jeff Roberson wrote:

jeff        2008-03-02 08:20:59 UTC

 FreeBSD src repository

 Modified files:
   sys/kern             sched_ule.c
 Log:
 Add support for the new cpu topology api:
  - When searching for affinity search backwards in the tree from the last
    cpu we ran on while the thread still has affinity for the group.   This
    can take advantage of knowledge of shared L2 or L3 caches among a
    group of cores.
  - When searching for the least loaded cpu find the least loaded cpu via
    the least loaded path through the tree.  This load balances system bus
    links, individual cache levels, and hyper-threaded/SMT cores.
  - Make the periodic balancer recursively balance the highest and lowest
    loaded cpu across each link.

 Add support for cpusets:
  - Convert the cpuset to a simple native cpumask_t while the kernel still
    only supports cpumask.
  - Pass the derived cpumask down through the cpu_search functions to
    restrict the result cpus.
  - Make the various steal functions resilient to failure since all threads
    can not run on all cpus any longer.

 General improvements:
  - Precisely track the lowest priority thread on every runq with
    tdq_setlowpri().  Before it was more advisory but this ended up having
    pathological behaviors.
  - Remove many #ifdef SMP conditions to simplify the code.
  - Get rid of the old cumbersome tdq_group.  This is more naturally
    expressed via the cpu_group tree.


With these changes ULE is the only scheduler that supports the new cpuset api. It succeeds on 4BSD but the scheduler doesn't obey the masks. I don't presently have a plan to implement it on 4BSD as it will be potentially very inefficient to search the runq for a compatible thread on every context switch. I won't object if someone else wants to implement this, otherwise I'll make the syscalls return ENOSYS if 4BSD is compiled in.

The improved cpu topology load balancing is a mixed bag. On some workloads we see considerable improvements. Right now mysql suffers when it has large numbers of threads but other things seem much improved. I will be continuing to tune this however and in most cases it's a win already.

Kris has done some excellent benchmarking as usual. Here you can see the improvement in postgres depending on various scheduler debug settings:

http://people.freebsd.org/~kris/scaling/pgsql-16cpu.png

The horrible green line is 7.0 for reference. The blue line is the same 16core machine with half of the cores disabled.

Thanks,
Jeff


 Sponsored by:   Nokia
 Testing by:     kris

 Revision  Changes    Path
 1.226     +443 -501  src/sys/kern/sched_ule.c

_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to