Hello,
The latency of 20 microseconds is for 4000-byte messages
going from MPI rank A to MPI rank B and then back to MPI rank A.
For a one-way trip, it is 10 microseconds.
And the latency for 1-byte messages
from MPI rank A to MPI rank B is already below 3 microseconds.
I will contact you off-list.
Thank you.
Elken, Tom a écrit :
Hi Sebastien,
The Infinipath / PSM software that was developed by PathScale/QLogic is now
part of Intel.
I'll advise you off-list about how to contact our customer support so we can
gather information about your software installation and work to resolve your
issue.
The 20 microseconds latency you are getting with Open MPI / PSM is still way
too high, so there may be some network issue which needs to be solved first.
-Tom
-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Sébastien Boisvert
Sent: Friday, June 29, 2012 10:56 AM
To: Open MPI Users
Subject: Re: [OMPI users] Performance scaled messaging and random crashes
Hi,
Thank you for the direction.
I installed Open-MPI 1.6 and the program is also crashing with 1.6.
Could there be a bug in my code ?
I don't see how disabling PSM would make the bug go away if the bug is in my
code.
Open-MPI configure command
module load gcc/4.5.3
./configure \
--prefix=/sb/project/nne-790-ab/software/Open-MPI/1.6/Build \ --with-openib
\ --with-psm \ --with-tm=/software/tools/torque/ \
| tee configure.log
Versions
module load gcc/4.5.3
module load /sb/project/nne-790-ab/software/modulefiles/mpi/Open-MPI/1.6
module load /sb/project/nne-790-ab/software/modulefiles/apps/ray/2.0.0
PSM parameters
guillimin> ompi_info -a|grep psm
MCA mtl: psm (MCA v2.0, API v2.0, Component v1.6)
MCA mtl: parameter "mtl_psm_connect_timeout" (current
value:<180>, data source: default value)
MCA mtl: parameter "mtl_psm_debug" (current value:
<1>, data source: default value)
MCA mtl: parameter "mtl_psm_ib_unit" (current value:
<-1>, data source: default value)
MCA mtl: parameter "mtl_psm_ib_port" (current value:
<0>, data source: default value)
MCA mtl: parameter "mtl_psm_ib_service_level" (current
value:<0>, data source: default value)
MCA mtl: parameter "mtl_psm_ib_pkey" (current value:
<32767>, data source: default value)
MCA mtl: parameter "mtl_psm_ib_service_id" (current
value:<0x1000117500000000>, data source: default value)
MCA mtl: parameter "mtl_psm_path_query" (current
value:<none>, data source: default value)
MCA mtl: parameter "mtl_psm_priority" (current value:
<0>, data source: default value)
Thank you.
Sébastien Boisvert
Jeff Squyres a écrit :
The Open MPI 1.4 series is now deprecated. Can you upgrade to Open MPI
1.6?
On Jun 29, 2012, at 9:02 AM, Sébastien Boisvert wrote:
I am using Open-MPI 1.4.3 compiled with gcc 4.5.3.
The library:
/usr/lib64/libpsm_infinipath.so.1.14: ELF 64-bit LSB shared object,
AMD x86-64, version 1 (SYSV), not stripped
Jeff Squyres a écrit :
Yes, PSM is the native transport for InfiniPath. It is faster than the
InfiniBand verbs support on the same hardware.
What version of Open MPI are you using?
On Jun 28, 2012, at 10:03 PM, Sébastien Boisvert wrote:
Hello,
I am getting random crashes (segmentation faults) on a super
computer (guillimin) using 3 nodes with 12 cores per node. The same
program (Ray) runs without any problem on the other super computers I
use.
The interconnect is "InfiniBand: QLogic Corp. InfiniPath QME7342
QDR HCA" and the messages transit using "performance scaled
messaging" (PSM) which I think is some sort of replacement to Infiniband
verbs although I am not sure.
Adding '--mca mtl ^psm' to the Open-MPI mpiexec program options
solves the problem, but increases the latency from 20 microseconds to 55
microseconds.
There seems to be some sort of message corruption during the
transit, but I can not rule out other explanations.
I have no idea what is going on and why disabling PSM solves the problem.
Versions
module load gcc/4.5.3
module load openmpi/1.4.3-gcc
Command that randomly crashes
mpiexec -n 36 -output-filename MiSeq-bug-2012-06-28.1 \ Ray -k 31 \
-o MiSeq-bug-2012-06-28.1 \ -p \
data-for-system-tests/ecoli-MiSeq/MiSeq_Ecoli_MG1655_110527_R1.fast
q \
data-for-system-tests/ecoli-MiSeq/MiSeq_Ecoli_MG1655_110527_R2.fast
q
Command that completes successfully
mpiexec -n 36 -output-filename psm-bug-2012-06-26-hotfix.1 \ --mca
mtl ^psm \ Ray -k 31 \ -o psm-bug-2012-06-26-hotfix.1 \ -p \
data-for-system-tests/ecoli-MiSeq/MiSeq_Ecoli_MG1655_110527_R1.fast
q \
data-for-system-tests/ecoli-MiSeq/MiSeq_Ecoli_MG1655_110527_R2.fast
q
Sébastien Boisvert
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users