The Open MPI 1.4 series is now deprecated. Can you upgrade to Open MPI 1.6?
On Jun 29, 2012, at 9:02 AM, Sébastien Boisvert wrote: > I am using Open-MPI 1.4.3 compiled with gcc 4.5.3. > > The library: > > /usr/lib64/libpsm_infinipath.so.1.14: ELF 64-bit LSB shared object, AMD > x86-64, version 1 (SYSV), not stripped > > > > Jeff Squyres a écrit : >> Yes, PSM is the native transport for InfiniPath. It is faster than the >> InfiniBand verbs support on the same hardware. >> >> What version of Open MPI are you using? >> >> >> On Jun 28, 2012, at 10:03 PM, Sébastien Boisvert wrote: >> >>> Hello, >>> >>> I am getting random crashes (segmentation faults) on a super computer >>> (guillimin) >>> using 3 nodes with 12 cores per node. The same program (Ray) runs without >>> any >>> problem on the other super computers I use. >>> >>> The interconnect is "InfiniBand: QLogic Corp. InfiniPath QME7342 QDR HCA" >>> and >>> the messages transit using "performance scaled messaging" (PSM) which I >>> think is some >>> sort of replacement to Infiniband verbs although I am not sure. >>> >>> Adding '--mca mtl ^psm' to the Open-MPI mpiexec program options solves >>> the problem, but increases the latency from 20 microseconds to 55 >>> microseconds. >>> >>> There seems to be some sort of message corruption during the transit, but I >>> can not rule out >>> other explanations. >>> >>> >>> I have no idea what is going on and why disabling PSM solves the problem. >>> >>> >>> Versions >>> >>> module load gcc/4.5.3 >>> module load openmpi/1.4.3-gcc >>> >>> >>> Command that randomly crashes >>> >>> mpiexec -n 36 -output-filename MiSeq-bug-2012-06-28.1 \ >>> Ray -k 31 \ >>> -o MiSeq-bug-2012-06-28.1 \ >>> -p \ >>> data-for-system-tests/ecoli-MiSeq/MiSeq_Ecoli_MG1655_110527_R1.fastq \ >>> data-for-system-tests/ecoli-MiSeq/MiSeq_Ecoli_MG1655_110527_R2.fastq >>> >>> >>> Command that completes successfully >>> >>> mpiexec -n 36 -output-filename psm-bug-2012-06-26-hotfix.1 \ >>> --mca mtl ^psm \ >>> Ray -k 31 \ >>> -o psm-bug-2012-06-26-hotfix.1 \ >>> -p \ >>> data-for-system-tests/ecoli-MiSeq/MiSeq_Ecoli_MG1655_110527_R1.fastq \ >>> data-for-system-tests/ecoli-MiSeq/MiSeq_Ecoli_MG1655_110527_R2.fastq >>> >>> >>> >>> Sébastien Boisvert >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/