Nick wrote: > The biggest issues may be the userspace > interface and a decent userspace management tool.
One possibility, perhaps, would be to have a boolean flag "sched_domain" on each cpuset, indicating whether it was a sched domain or not. If a cpuset had its sched_domain flag set, then that cpusets cpus_allowed mask would define a sched domain. Later Nick wrote: > In the (hopefully) common case where there are disjoint partitions > _somewhere_, sched domains can do the job in a much better > way than task cpu affinities (better isolation, multiprocessor > balancing shouldn't break down). > > Those users with overlapping CPU sets can then use task affinities > on top of sched domains partitions to get the desired result. Ok - seems it should work with the above cpuset flag marking sched domains, and a rule that _those_ cpusets so marked can't overlap. Other cpusets that are not so marked, and any sched_setaffinity calls, can do whatever they want. Trying to turn on the sched_domain flag on a cpuset that overlapped with existing such cpuset sched_domains, or trying to mess with the CPUs (cpus_allowed) in an existing cpuset sched_domain so as to force it to overlap, would return an error to user space on that write(2). If the sysadmin didn't mark any cpusets as sched_domains, then fall back to something automatic and useful. Inside the kernel, we'll need someway for the cpuset code to tell the sched code about sched_domain changes. This might mean something like the following. Have the sched code provide the cpuset code a couple of routines, one to setup and and the other to tear down sched_domains. Both calls would take a cpumask_t argument, and return void. The setup call must pass a cpumask that does not overlap any existing sched domains defined via cpusets. The tear down call must pass a cpumask value exactly matching a previous, still active, setup call. So if someone made a single CPU change to an existing sched_domain defining cpuset, the kernel cpuset code would have to call the kernel sched code twice, first to tear down the old sched_domain, and then to setup the new, slightly different, one. The cpuset code would likely be holding the single global cpuset_sem semaphore across this pair of calls. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/