Hello all
I am asking for help for the following situation:
I have two (mostly identical) nodes. Each of them have (completely
identical)
1. qlogic 4x DDR infiniband, AND
2. Chelsio S310E (T3 chip based) 10GE iWARP cards.
Both are connected back-to-back, without a switch. The connection is
physi
Per the error message, can you try to
mpirun --mca btl_openib_if_include cxgb3_0 --mca
btl_openib_max_send_size 65536 ...
and see whether it helps ?
you can also try various settings for the receive queue, for example
edit your /.../share/openmpi/mca-btl-openib-device-params.ini and set
the
I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about half
the performance for MPI over TCP as I do with raw TCP. Before I start digging
in to this more deeply, does anyone know what might cause that?
For what it's worth, I see the same issues with MPICH, but I do not see it
Jason,
how many Ethernet interfaces are there ?
if several, can you try again with one only
mpirun --mca btl_tcp_if_include eth0 ...
Cheers,
Gilles
On Tuesday, March 8, 2016, Jackson, Gary L. wrote:
>
> I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about
> half the perfor
This is a bug we need to deal with. If we are getting queue pair
settings from an ini file and the max_send_size if the default value we
should set the max send size to the size of the largest queue pair. I
will work on a fix.
-Nathan
On Tue, Mar 08, 2016 at 03:57:39PM +0900, Gilles Gouaillardet
See https://github.com/open-mpi/ompi/pull/1439
I was seeing this problem when enabling CUDA support as it sets
btl_openib_max_send_size to 128k but does not change the receive queue
settings. Tested the commit in #1439 and it fixes the issue for me.
-Nathan
On Tue, Mar 08, 2016 at 03:57:39PM +0
Nope, just one ethernet interface:
$ ifconfig
eth0 Link encap:Ethernet HWaddr 0E:47:0E:0B:59:27
inet addr:xxx.xxx.xxx.xxx Bcast:xxx.xxx.xxx.xxx
Mask:255.255.252.0
inet6 addr: fe80::c47:eff:fe0b:5927/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9001 Metric
Jackson,
i am surprised with the MTU value ...
IIRC, MTU for ethernet jumbo frame is 9000, not 9001.
can you run tracepath on both boxes (to check which mtu is used) ?
then, can you try to set MTU=1500 on both boxes
(warning, get ready to lose the connection) and try again with
openmpi and intel
If you are using instance types that support SR-IOV (aka. "enhanced
networking" in AWS), then turn it on. We saw huge differences when SR-IOV
is enabled
http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html
http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cl
Jackson,
one more thing, how did you build openmpi ?
if you built from git (and without VPATH), then --enable-debug is
automatically set, and this is hurting performance.
if not already done, i recommend you download the latest openmpi tarball
(1.10.2) and
./configure --with-platform=contrib/
10 matches
Mail list logo