on 22/12/2011 21:47 Steve Kargl said the following: > On Thu, Dec 22, 2011 at 09:01:15PM +0200, Andriy Gapon wrote: >> on 22/12/2011 20:45 Steve Kargl said the following: >>> I've used schedgraph to look at the ktrdump output. A jpg is >>> available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg >>> This shows the ping-pong effect where here 3 processes appear to be >>> using 2 cpus while the remaining 2 processes are pinned to their >>> cpus. >> >> I'd recommended enabling CPU-specific background colors via the menu in >> schedgraph for a better illustration of your findings. >> >> NB: I still don't understand the point of purposefully running N+1 CPU-bound >> processes. >> > > The point is that this is a node in a HPC cluster with > multiple users. Sure, I can start my job on this node > with only N cpu-bound jobs. Now, when user John Doe > wants to run his OpenMPI program should he login into > the 12 nodes in the cluster to see if someone is already > running N cpu-bound jobs on a given node? 4BSD > gives my jobs and John Doe's jobs a fair share of the > available cpus. ULE does not give a fair share and > if you read the summary file I put up on the web, > you see that it is fairly non-deterministic on when a > OpenMPI run will finish (see the mean absolute deviations > in the table of 'real' times that I posted).
OK. I think I know why the uneven load occurs. I remember even trying to explain my observations. There are two things: 1. ULE doesn't have either a common across CPUs runqueue nor any other kind of mechanism for enforcing true global fairness of CPU resource sharing. 2. ULE's rebalancing code is biased and that leads to the situation where sub-groups of threads can share subsets of CPUs rather fairly, but there won't be a global fairness. I haven't really given any thought as to how to fix or workaround these issues. One dumb idea is to add an element of randomness to a choice between equally loaded CPUs (and their subsets) instead of having a permanent bias. > There is the additional observation in one of my 2008 > emails (URLs have been posted) that if you have N+1 > cpu-bound jobs with, say, job0 and job1 ping-ponging > on cpu0 (due to ULE's cpu-affinity feature) and if I > kill job2 running on cpu1, then neither job0 nor job1 > will migrate to cpu1. So, one now has N cpu-bound > jobs running on N-1 cpus. Have you checked recently that that is still the case? I would consider this a rather serious bug as opposed to a sub-optimal scheduling. > Finally, my initial post in this email thread was to > tell O. Hartman to quit beating his head against > a wall with ULE (in an HPC environment). Switch to > 4BSD. This was based on my 2008 observations and > I've now wasted 2 days gather additional information > which only re-affirms my recommendation. I think that any objective information has its value. So maybe the time is not really wasted. I think there is no argument that for your usage pattern 4BSD is better than ULE at the moment, because of the inherent design choices of both schedulers and their current implementations. But I think that ULE could be improved to produce more global fairness. P.S. But, but, this thread has seen so many different problem reports about ULE heaped together that it's very easy to get confused about what is caused by what and what is real and what is not. E.g. I don't think that there is a direct relation between this issue (N+1 CPU-bound tasks) and "my X is sluggish with ULE when I untar a large file". P.P.S. About the subject line. Let's recall why ULE has become a default. It has happened because of many observations from users and developers that "things" were faster/"snappier" with ULE than with 4BSD and a significant stream of requests to make it the default. So it's business as usual. The schedulers are different, so there those for whom one scheduler works better and those for whom the other works better and those for whom both work reasonably well and those for whom neither is satisfactory and those who don't really care/compare. There is a silent majority and the vocal minorities. There are specific bugs and quirks, advantages and disadvantages, usage patterns, hardware configurations and what not. When everybody starts to talk at the same time, it's a huge mess. But silently triaging and debugging one problem at a time also doesn't always work. There, I've said it. Let me now try to recall why I felt a need to say all of this :-) -- Andriy Gapon _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"