Barry Rountree schrieb: > On Thu, Apr 24, 2008 at 12:56:03PM +0200, Ingo Josopait wrote: >> I am using one of the nodes as a desktop computer. Therefore it is most >> important for me that the mpi program is not so greedily acquiring cpu >> time. > > This is a kernel scheduling issue, not an OpenMPI issue. Busy waiting in > one process should not cause noticable loss of responsiveness in another > processes. Have you experimented with the "nice" command?
I don't think that is a kernel issue. In the current OpenMPI implementation, when mpi is waiting for new messages, it simply waits in a loop for new messages to arrive. The kernel has then no way to know whether the program is actually doing some useful calculations or whether it is simply busy waiting. If, on the other hand, mpi would tell the kernel that it is waiting for new messages, the kernel could schedule its cpu time more efficiently to background programs, or make an idle call if no other program is running (which would lower the energy consumption). > >> But I would imagine that the energy consumption is generally a big >> issue, since energy is a major cost factor in a computer cluster. > > Yup. > >> When a >> cpu is idle, it uses considerably less energy. Last time I checked my >> computer used 180W when both cpu cores were working and 110W when both >> cores were idle. > > What processor is this? Athlon X2 6000+ (3 Ghz) > >> I just made a small hack to solve the problem. I inserted a simple sleep >> call into the function 'opal_condition_wait': >> >> --- orig/openmpi-1.2.6/opal/threads/condition.h >> +++ openmpi-1.2.6/opal/threads/condition.h >> @@ -78,6 +78,7 @@ >> #endif >> } else { >> while (c->c_signaled == 0) { >> + usleep(1000); >> opal_progress(); >> } >> } >> > > I expect this would lead to increased execution time for all programs > and increased energy consumption for most programs. Recall that energy > is power multiplied by time. You're reducing the power on some nodes > and increasing time on all nodes. > >> The usleep call will let the program sleep for about 4 ms (it won't >> sleep for a shorter time because of some timer granularity). But that is >> good enough for me. The cpu usage is (almost) zero when the tasks are >> waiting for one another. > > I think your mistake here is considering CPU load to be a useful metric. > It isn't. Responsiveness is a useful metric, energy is a useful metric, > but CPU load isn't a reliable guide to either of these. > >> For a proper implementation you would want to actively poll without a >> sleep call for a few milliseconds, and then use some other method that >> sleeps not for a fixed time, but until new messages arrive. > > Well, it sounds like you can get to this before I can. Post your patch > here and I'll test it on the NAS suite, UMT2K, Paradis, and a few > synthetic benchmarks I've written. The cluster I use has multimeters > hooked up so I can also let you know how much energy is being saved. > > Barry Rountree > Ph.D. Candidate, Computer Science > University of Georgia > Here is now a slightly more sophisticated patch: --- orig/openmpi-1.2.6/opal/threads/condition.h 2006-11-09 19:53:32.000000000 +0100 +++ openmpi-1.2.6/opal/threads/condition.h 2008-04-24 17:15:29.000000000 +0200 @@ -77,7 +77,11 @@ } #endif } else { + int nosleep_counter = 300000; while (c->c_signaled == 0) { + if (--nosleep_counter < 0) { + usleep(1000); + } opal_progress(); } } It will actively poll for a short time (0.1 seconds on my 2Ghz athlon64 laptop, this may adjusted by chosing a different number than 300000), and after that it will sleep for about 4 ms in each loop cycle. You may test it. It should not increase the latency by much. The cpu usage (as displayed by 'top') is nearly zero when waiting for new data, and judging from the noise level of my laptop fan, the cpu uses far less power. A better solution would certainly be to use some other blocking mechanism, but as others have said in this thread, this seems to be a bit less trivial.