The Open MPI 1.4 series is now deprecated.  Can you upgrade to Open MPI 1.6?


On Jun 29, 2012, at 9:02 AM, Sébastien Boisvert wrote:

> I am using Open-MPI 1.4.3 compiled with gcc 4.5.3.
> 
> The library:
> 
> /usr/lib64/libpsm_infinipath.so.1.14: ELF 64-bit LSB shared object, AMD 
> x86-64, version 1 (SYSV), not stripped
> 
> 
> 
> Jeff Squyres a écrit :
>> Yes, PSM is the native transport for InfiniPath.  It is faster than the 
>> InfiniBand verbs support on the same hardware.
>> 
>> What version of Open MPI are you using?
>> 
>> 
>> On Jun 28, 2012, at 10:03 PM, Sébastien Boisvert wrote:
>> 
>>> Hello,
>>> 
>>> I am getting random crashes (segmentation faults) on a super computer 
>>> (guillimin)
>>> using 3 nodes with 12 cores per node. The same program (Ray) runs without 
>>> any
>>> problem on the other super computers I use.
>>> 
>>> The interconnect is "InfiniBand: QLogic Corp. InfiniPath QME7342 QDR HCA" 
>>> and
>>> the messages transit using "performance scaled messaging" (PSM) which I 
>>> think is some
>>> sort of replacement to Infiniband verbs although I am not sure.
>>> 
>>> Adding '--mca mtl ^psm' to the Open-MPI mpiexec program options solves
>>> the problem, but increases the latency from 20 microseconds to 55 
>>> microseconds.
>>> 
>>> There seems to be some sort of message corruption during the transit, but I 
>>> can not rule out
>>> other explanations.
>>> 
>>> 
>>> I have no idea what is going on and why disabling PSM solves the problem.
>>> 
>>> 
>>> Versions
>>> 
>>> module load gcc/4.5.3
>>> module load openmpi/1.4.3-gcc
>>> 
>>> 
>>> Command that randomly crashes
>>> 
>>> mpiexec -n 36 -output-filename MiSeq-bug-2012-06-28.1 \
>>> Ray -k 31 \
>>> -o MiSeq-bug-2012-06-28.1 \
>>> -p \
>>> data-for-system-tests/ecoli-MiSeq/MiSeq_Ecoli_MG1655_110527_R1.fastq \
>>> data-for-system-tests/ecoli-MiSeq/MiSeq_Ecoli_MG1655_110527_R2.fastq
>>> 
>>> 
>>> Command that completes successfully
>>> 
>>> mpiexec -n 36 -output-filename  psm-bug-2012-06-26-hotfix.1 \
>>> --mca mtl ^psm \
>>> Ray -k 31 \
>>> -o psm-bug-2012-06-26-hotfix.1 \
>>> -p \
>>> data-for-system-tests/ecoli-MiSeq/MiSeq_Ecoli_MG1655_110527_R1.fastq \
>>> data-for-system-tests/ecoli-MiSeq/MiSeq_Ecoli_MG1655_110527_R2.fastq
>>> 
>>> 
>>> 
>>> Sébastien Boisvert
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to