Going on the assumption that it was a copy/paste error, the next question is: how many nodes were in your allocation?
I ask because there is a change in the way we launch between 1.8 and 1.6.5. Starting in the 1.7 series, mpirun launches daemons across your allocation at startup so it can collect information on the topology of the nodes in the allocation - this info is then used when mapping the job. In the 1.6 series, we only launched daemons on the nodes actually being used. So 1.6.5 would indeed be faster IF you have a large allocation, but only launch a small number of procs. What you can do to compensate is add the --novm option to mpirun (or use the "state_novm_select=1" MCA param) which reverts back to the 1.6.5 behavior. On Apr 10, 2014, at 7:00 AM, Ralph Castain <r...@open-mpi.org> wrote: > Just to ensure I understand what you are saying: it appears that 1.8 is much > faster than 1.6.5 with the default settings, but slower when you set > btl=tcp,self? > > This seems rather strange. I note that the 1.8 value is identical in the two > cases, but somehow 1.6.5 went much faster in the latter case. Is this a > copy/paste error? > > > On Apr 10, 2014, at 2:05 AM, Victor Vysotskiy <victor.vysots...@teokem.lu.se> > wrote: > >> Dear Developers, >> >> I have faced a performance degradation on multi-core single processor >> machine. Specifically, in the most recent Open MPI v1.8 the initialization >> and process startup stage became ~10x slower compared to v1.6.5. In order to >> measure timings I have used the following code snippet: >> >> /*-------------------------------------------*/ >> #include <mpi.h> >> >> int main (int argc, char *argv[]) { >> >> MPI_Init(&argc,&argv); >> MPI_Finalize(); >> >> return 0; >> } >> /*-------------------------------------------*/ >> >> The execution wall time has been measured in a trivial way by using the >> 'time' command, i.e.: >> >> time mpirun -np 2 ./a.out >> >> Below are given averaged timings for both versions on Linux x86_64 (Intel >> i7-3630): >> >> Default settings: >> 1.8 : 0.679 s >> 1.6.5: 1.041 s >> >> OMPI_MCA_btl=tcp,self: >> 1.8 : 0.679 s >> 1.6.5: 0.041 s >> >> The same problem has been detected on Mac OS X v10.9.2. >> >> Here I should stress that others MPI distributions perform as the OpenMPI >> v1.6.5 with the TCP byte transfer layer activated. >> >> So, I am wondering whether it is possible to tune v1.8 in order to boost the >> startup process? The problem is that during the automatic nightly >> verification of our program we usually spawn parallel binaries a thousands >> of times. >> >> Thank you In advance! >> >> Best regards, >> Victor. >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >