> Yes this is the result of the hierachical nature of cpusets which already > causes issues with the scheduler. It is rather typical that cpusets are > used to partition the memory and cpus. Overlappig cpusets seem to have > mainly an administrative function. Paul?
The heavy weight tasks, which are expected to be applying serious memory pressure (whether for data pages or dirty file pages), are usually in non-overlapping cpusets, or sharing the same cpuset, but not partially overlapping with, or a proper superset of, some other cpuset holding an active job. The higher level cpusets, such as the top cpuset, or the one deeded over to the Batch Scheduler, are proper supersets of many other cpusets. We avoid putting anything heavy weight in those cpusets. Sometimes of course a task turns out to be unexpectedly heavy weight. But in that case, we're mostly interested in function (system keeps running), not performance. That is, if someone setup what Andrew described, with a job in a large cpuset sucking up all available memory from one in a smaller, contained cpuset, I don't think I'm tuning for optimum performance anymore. Rather I'm just trying to keep the system running and keep unrelated jobs unaffected while we dig our way out of the hole. If the smaller job OOM's, that's tough nuggies. They asked for it. Jobs in -unrelated- (non-overlapping) cpusets should ride out the storm with little or no impact on their performance. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/