Re: [OMPI users] Busy waiting [was Re: (no subject)]

Jeff Squyres Thu, 24 Apr 2008 11:48:51 -0400

What George said is what I meant by "it's a non-trivial amount ofwork." :-)

In addition to when George adds these patches (allowing components toregister for blocking progress), there's going to be some work to dealwith shared memory (we have some ideas here, but it's a bit more thanjust allowing shmem to register to blocking progress) and other randomissues that will arise.



On Apr 24, 2008, at 11:17 AM, George Bosilca wrote:

Well, blocking or not blocking this is the question !!!Unfortunately, it's more complex than this thread seems to indicate.It's not that we didn't want to implement it in Open MPI, it's thatat one point we had to make a choice ... and we decided to always gofor performance first.
However, there were some experimentations to go in blocking more atleast when only TCP was used. Unfortunately, this break some otherthings in Open MPI, because of our progression model. We arecomponent based and these components are allowed to registerperiodically called callbacks ... And here periodically means asoften as possible. There are at least 2 components that use thismechanism for their own progression: romio (mca/io/romio) and one-sided communications (mca/osc/*). Switching in blocking mode willbreak these 2 components completely. This was the reason why we'renot blocking when only TCP is used.
Anyway, there is a solution. We have to move from a poll baseprogress for these components to an event base progress. There weresome discussions, and if I remember well ... everybody's waiting forone of my patches :) A patch that allow a component to add acompletion callback to MPI requests ... I don't have a cleardeadline for this, and unfortunately I'm a little busy right now ...but I'll work on it asap.
 george.

On Apr 24, 2008, at 9:43 AM, Barry Rountree wrote:
On Thu, Apr 24, 2008 at 12:56:03PM +0200, Ingo Josopait wrote:
I am using one of the nodes as a desktop computer. Therefore it ismostimportant for me that the mpi program is not so greedily acquiringcpu
time.
This is a kernel scheduling issue, not an OpenMPI issue. Busywaiting inone process should not cause noticable loss of responsiveness inanother
processes.  Have you experimented with the "nice" command?
But I would imagine that the energy consumption is generally a big
issue, since energy is a major cost factor in a computer cluster.
Yup.
When a
cpu is idle, it uses considerably less energy. Last time I checkedmycomputer used 180W when both cpu cores were working and 110W whenboth
cores were idle.
What processor is this?
I just made a small hack to solve the problem. I inserted a simplesleep
call into the function 'opal_condition_wait':

--- orig/openmpi-1.2.6/opal/threads/condition.h
+++ openmpi-1.2.6/opal/threads/condition.h
@@ -78,6 +78,7 @@
#endif
   } else {
       while (c->c_signaled == 0) {
+           usleep(1000);
           opal_progress();
       }
   }
I expect this would lead to increased execution time for all programs
and increased energy consumption for most programs. Recall thatenergy
is power multiplied by time.  You're reducing the power on some nodes
and increasing time on all nodes.
The usleep call will let the program sleep for about 4 ms (it won't
sleep for a shorter time because of some timer granularity). Butthat isgood enough for me. The cpu usage is (almost) zero when the tasksare
waiting for one another.
I think your mistake here is considering CPU load to be a usefulmetric.It isn't. Responsiveness is a useful metric, energy is a usefulmetric,
but CPU load isn't a reliable guide to either of these.
For a proper implementation you would want to actively pollwithout asleep call for a few milliseconds, and then use some other methodthat
sleeps not for a fixed time, but until new messages arrive.
Well, it sounds like you can get to this before I can. Post yourpatch
here and I'll test it on the NAS suite, UMT2K, Paradis, and a few
synthetic benchmarks I've written.  The cluster I use has multimeters
hooked up so I can also let you know how much energy is being saved.

Barry Rountree
Ph.D. Candidate, Computer Science
University of Georgia




_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

Re: [OMPI users] Busy waiting [was Re: (no subject)]

Reply via email to