Thanks, that was actually a lot of help, I had very little understanding of the bynode and byslot thingy, thanks
On 6/5/08, Jeff Squyres <jsquy...@cisco.com> wrote: > > On May 23, 2008, at 9:07 PM, Cally K wrote: > > > Hi, I have a question about --bynode and --byslot that i would like > > to clarify > > > > Say, for example, I have a hostfile > > > > #Hostfile > > > > __________________________ > > node0 > > node1 slots=2 max_slots=2 > > node2 slots=2 max_slots=2 > > node3 slots=4 max_slots=4 > > ___________________________ > > > > There are 4 nodes and 9 slots, how do I run my mpirun, for now I use > > > > a) mpirun -np --bynode 4 ./abcd > > I assume you mean "... -np 4 --bynode ..." > > > I know that the slot thingy is for SMPs, and I have tried running > > mpirun -np --byslot 9 ./abcd > > > > and I noticed that its longer when I do --byslot when compared to -- > > bynode > > According to your text, you're running 9 processes when using --byslot > and 4 when using --bynode. Is that a typo? I'll assume that it is -- > that you meant to use 9 in both cases. > > > and I just read the faq that said, by defauly the byslot option is > > used, so I dun have to use it rite,,, > > I'm not sure what your question is. The actual performance may depend > on your application and what its communication and computation > patterns are. It gets more difficult to model when you have a > heterogeneous setup (like it looks like you have, per your hostfile). > > Let's take your example of 9 processes. > > - With --bynode, the MPI_COMM_WORLD ranks will be laid out as follows > (MCRW = "MPI_COMM_WORLD rank") > > node0: MCWR 0 > node1: MCWR 1, MCWR 4 > node2: MCWR 2, MCWR 5 > node3: MCRW 3, MCRW 6, MCWR 7, MCWR 8 > > - With --byslot, it'll look like this: > > node0: MCWR 0 > node1: MCWR 1, MCWR 2 > node2: MCWR 3, MCWR 4 > node3: MCRW 5, MCRW 6, MCWR 7, MCWR 8 > > In short, OMPI is doing round-robin placement of your processes; the > only difference is in which dimension is traversed first: by node or > by slot. > > As to why there's such a performance difference, it could depend on a > lot of things: the difference in computational speed and/or RAM on > your 4 nodes, the changing communication patterns between the two > (shared memory is usually used for on-node communication, which is > usually faster than most networks), etc. It really depends on what > your application is *doing*. > > Sorry I can't be of more help... > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >