> I would still be suspicious -- ofud is not well tested, and it can definitely > hang if there are network drops.
It hanged. > ________________________________________ > De : users-boun...@open-mpi.org [users-boun...@open-mpi.org] de la part de > Jeff Squyres [jsquy...@cisco.com] > Date d'envoi : 21 septembre 2011 17:09 > À : Open MPI Users > Objet : Re: [OMPI users] RE : RE : Latency of 250 microseconds with > Open-MPI 1.4.3, Mellanox Infiniband and 256 MPI ranks > > On Sep 21, 2011, at 4:24 PM, Sébastien Boisvert wrote: > >>> What happens if you run 2 ibv_rc_pingpong's on each node? Or N >>> ibv_rc_pingpongs? >> >> With 11 ibv_rc_pingpong's >> >> http://pastebin.com/85sPcA47 >> >> Code to do that => https://gist.github.com/1233173. >> >> Latencies are around 20 microseconds. > > This seems to imply that the network is to blame for the higher latency...? > > I.e., if you run the same pattern with MPI processes and get 20us latency, > that would tend to imply that the network itself is not performing well with > that IO pattern. > >> My job seems to do well so far with ofud ! >> >> [sboisver12@colosse2 ray]$ qstat >> job-ID prior name user state submit/start at queue >> slots ja-task-ID >> ----------------------------------------------------------------------------------------------------------------- >> 3047460 0.55384 fish-Assem sboisver12 r 09/21/2011 15:02:25 >> med@r104-n58 256 > > I would still be suspicious -- ofud is not well tested, and it can definitely > hang if there are network drops. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >