Yes I am, (master and child 1 running on the same machine). But knowing the oversubscribing issue, I am using mpi_yield_when_idle which should fix precisely this problem, right? Or is the option ignored,when initially there is no second process? I did give both machines multiple slots, so OpenMPI "knows" that the possibility for more oversubscription may arise. Confused, Murat
Jeff Squyres schrieb: > Are you perchance oversubscribing your nodes? > > Open MPI does not currently handle well when you initially > undersubscribe your nodes but then, due to spawning, oversubscribe > your nodes. In this case, OMPI will be aggressively polling in all > processes, not realizing that the node is now oversubscribed and it > should be yielding the processor so that other processes can run. > > On Oct 30, 2007, at 10:57 AM, Murat Knecht wrote: > > >> Hi, >> >> does someone know whether there is a special requirement on the >> order of >> spawning processes and the consequent merge of the intercommunicators? >> I have two hosts, let's name them local and remote, and a parent >> process >> on local that goes on spawning one process on each one of the two >> nodes. >> After each spawn the parent process and all existing childs >> participate >> in merging the created Intercommunicator into an Intracommunicator >> that >> connects - in the end - alls three processes. >> >> The weird thing is though, when I spawn them in the order local, >> remote >> at the second, the last spawn all three processes block when >> encountering MPI_Merge. Though, when I switch the order around to >> spawning first the process on remote and then on local, everything >> works >> out: The two processes are spawned and the Intracommunicators created >> from the Merge. Everything goes well, too, if I decide to spawn both >> processes on either one of the machines. (The existing children are >> informed via a message that they shall participate in the Spawn and >> Merge since these are collective operations.) >> >> Is there some implicit developer-level knowledge that explains why the >> order defines the outcome? Logically, there ought to be no difference. >> Btw, I work with two Linux nodes and an ordinary Ethernet-TCP >> connection >> between them. >> >> Thanks, >> Murat >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > >