[OMPI users] iWARP usage issue

2016-03-08 Thread dpchoudh .
Hello all

I am asking for help for the following situation:

I have two (mostly identical) nodes. Each of them have (completely
identical)
1. qlogic 4x DDR infiniband, AND
2. Chelsio S310E (T3 chip based) 10GE iWARP cards.

Both are connected back-to-back, without a switch. The connection is
physically OK and IP traffic can flow on both of them without issues.

The issue is, I can run MPI programs using the openib BTL using the qlogic
card, but not the Chelsio card. Here are the commands:

[durga@smallMPI ~]$ ibv_devices
device node GUID
--  
cxgb3_0 00074306cd3b  <-- Chelsio
qib0001175ff831d   <-- Qlogic

The following command works:

 mpirun -np 2 --hostfile ~/hostfile -mca btl_openib_if_include qib0
./osu_acc_latency

And the following do not:
mpirun -np 2 --hostfile ~/hostfile -mca btl_openib_if_include cxgb3_0
./osu_acc_latency

mpirun -np 2 --hostfile ~/hostfile -mca pml ob1 -mca btl_openib_if_include
cxgb3_0 ./osu_acc_latency

mpirun -np 2 --hostfile ~/hostfile -mca pml ^cm -mca btl_openib_if_include
cxgb3_0 ./osu_acc_latency

The error I get is the following (in all of the non-working cases):

WARNING: The largest queue pair buffer size specified in the
btl_openib_receive_queues MCA parameter is smaller than the maximum
send size (i.e., the btl_openib_max_send_size MCA parameter), meaning
that no queue is large enough to receive the largest possible incoming
message fragment.  The OpenFabrics (openib) BTL will therefore be
deactivated for this run.

  Local host: smallMPI
  Largest buffer size: 65536
  Maximum send fragment size: 131072
--
--
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:   bigMPI
  Local device: cxgb3_0
  Local port:   1
  CPCs attempted:   udcm
--

I have a vague understanding of what the message is trying to say, but I do
not know which file or configuration parameters to change to fix the
situation.

Thanks in advance
Durga


Life is complex. It has real and imaginary parts.


Re: [OMPI users] iWARP usage issue

2016-03-08 Thread Gilles Gouaillardet

Per the error message, can you try to

mpirun --mca btl_openib_if_include cxgb3_0 --mca 
btl_openib_max_send_size 65536 ...


and see whether it helps ?

you can also try various settings for the receive queue, for example 
edit your /.../share/openmpi/mca-btl-openib-device-params.ini and set 
the parameters for your specific hardware


Cheers,

Gilles

On 3/8/2016 2:55 PM, dpchoudh . wrote:

Hello all

I am asking for help for the following situation:

I have two (mostly identical) nodes. Each of them have (completely 
identical)

1. qlogic 4x DDR infiniband, AND
2. Chelsio S310E (T3 chip based) 10GE iWARP cards.

Both are connected back-to-back, without a switch. The connection is 
physically OK and IP traffic can flow on both of them without issues.


The issue is, I can run MPI programs using the openib BTL using the 
qlogic card, but not the Chelsio card. Here are the commands:


[durga@smallMPI ~]$ ibv_devices
device node GUID
--  
cxgb3_0 00074306cd3b  <-- Chelsio
qib0001175ff831d <-- Qlogic

The following command works:

 mpirun -np 2 --hostfile ~/hostfile -mca btl_openib_if_include qib0 
./osu_acc_latency


And the following do not:
mpirun -np 2 --hostfile ~/hostfile -mca btl_openib_if_include cxgb3_0 
./osu_acc_latency


mpirun -np 2 --hostfile ~/hostfile -mca pml ob1 -mca 
btl_openib_if_include cxgb3_0 ./osu_acc_latency


mpirun -np 2 --hostfile ~/hostfile -mca pml ^cm -mca 
btl_openib_if_include cxgb3_0 ./osu_acc_latency


The error I get is the following (in all of the non-working cases):

WARNING: The largest queue pair buffer size specified in the
btl_openib_receive_queues MCA parameter is smaller than the maximum
send size (i.e., the btl_openib_max_send_size MCA parameter), meaning
that no queue is large enough to receive the largest possible incoming
message fragment.  The OpenFabrics (openib) BTL will therefore be
deactivated for this run.

  Local host: smallMPI
  Largest buffer size: 65536
  Maximum send fragment size: 131072
--
--
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:   bigMPI
  Local device: cxgb3_0
  Local port:   1
  CPCs attempted:   udcm
--

I have a vague understanding of what the message is trying to say, but 
I do not know which file or configuration parameters to change to fix 
the situation.


Thanks in advance
Durga


Life is complex. It has real and imaginary parts.


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28657.php




[OMPI users] Poor performance on Amazon EC2 with TCP

2016-03-08 Thread Jackson, Gary L.

I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about half 
the performance for MPI over TCP as I do with raw TCP. Before I start digging 
in to this more deeply, does anyone know what might cause that?

For what it's worth, I see the same issues with MPICH, but I do not see it with 
Intel MPI.

--
Gary Jackson



Re: [OMPI users] Poor performance on Amazon EC2 with TCP

2016-03-08 Thread Gilles Gouaillardet
Jason,

how many Ethernet interfaces are there ?
if several, can you try again with one only
mpirun --mca btl_tcp_if_include eth0 ...

Cheers,

Gilles

On Tuesday, March 8, 2016, Jackson, Gary L.  wrote:

>
> I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about
> half the performance for MPI over TCP as I do with raw TCP. Before I start
> digging in to this more deeply, does anyone know what might cause that?
>
> For what it's worth, I see the same issues with MPICH, but I do not see it
> with Intel MPI.
>
> --
> Gary Jackson
>
>


Re: [OMPI users] iWARP usage issue

2016-03-08 Thread Nathan Hjelm

This is a bug we need to deal with. If we are getting queue pair
settings from an ini file and the max_send_size if the default value we
should set the max send size to the size of the largest queue pair. I
will work on a fix.

-Nathan

On Tue, Mar 08, 2016 at 03:57:39PM +0900, Gilles Gouaillardet wrote:
>Per the error message, can you try to
> 
>mpirun --mca btl_openib_if_include cxgb3_0 --mca btl_openib_max_send_size
>65536 ...
> 
>and see whether it helps ?
> 
>you can also try various settings for the receive queue, for example edit
>your /.../share/openmpi/mca-btl-openib-device-params.ini and set the
>parameters for your specific hardware
> 
>Cheers,
> 
>Gilles
> 
>On 3/8/2016 2:55 PM, dpchoudh . wrote:
> 
>  Hello all
> 
>  I am asking for help for the following situation:
> 
>  I have two (mostly identical) nodes. Each of them have (completely
>  identical)
>  1. qlogic 4x DDR infiniband, AND
>  2. Chelsio S310E (T3 chip based) 10GE iWARP cards.
> 
>  Both are connected back-to-back, without a switch. The connection is
>  physically OK and IP traffic can flow on both of them without issues.
> 
>  The issue is, I can run MPI programs using the openib BTL using the
>  qlogic card, but not the Chelsio card. Here are the commands:
> 
>  [durga@smallMPI ~]$ ibv_devices
>  device node GUID
>  --  
>  cxgb3_0 00074306cd3b  <-- Chelsio
>  qib0001175ff831d   <-- Qlogic
> 
>  The following command works:
> 
>   mpirun -np 2 --hostfile ~/hostfile -mca btl_openib_if_include qib0
>  ./osu_acc_latency
> 
>  And the following do not:
>  mpirun -np 2 --hostfile ~/hostfile -mca btl_openib_if_include cxgb3_0
>  ./osu_acc_latency
> 
>  mpirun -np 2 --hostfile ~/hostfile -mca pml ob1 -mca
>  btl_openib_if_include cxgb3_0 ./osu_acc_latency
> 
>  mpirun -np 2 --hostfile ~/hostfile -mca pml ^cm -mca
>  btl_openib_if_include cxgb3_0 ./osu_acc_latency
> 
>  The error I get is the following (in all of the non-working cases):
> 
>  WARNING: The largest queue pair buffer size specified in the
>  btl_openib_receive_queues MCA parameter is smaller than the maximum
>  send size (i.e., the btl_openib_max_send_size MCA parameter), meaning
>  that no queue is large enough to receive the largest possible incoming
>  message fragment.  The OpenFabrics (openib) BTL will therefore be
>  deactivated for this run.
> 
>Local host: smallMPI
>Largest buffer size: 65536
>Maximum send fragment size: 131072
>  
> --
>  
> --
>  No OpenFabrics connection schemes reported that they were able to be
>  used on a specific port.  As such, the openib BTL (OpenFabrics
>  support) will be disabled for this port.
> 
>Local host:   bigMPI
>Local device: cxgb3_0
>Local port:   1
>CPCs attempted:   udcm
>  
> --
> 
>  I have a vague understanding of what the message is trying to say, but I
>  do not know which file or configuration parameters to change to fix the
>  situation.
> 
>  Thanks in advance
>  Durga
>  Life is complex. It has real and imaginary parts.
> 
>  ___
>  users mailing list
>  us...@open-mpi.org
>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>  Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28657.php

> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28658.php



pgpu3c5dDThZr.pgp
Description: PGP signature


Re: [OMPI users] iWARP usage issue

2016-03-08 Thread Nathan Hjelm

See https://github.com/open-mpi/ompi/pull/1439

I was seeing this problem when enabling CUDA support as it sets
btl_openib_max_send_size to 128k but does not change the receive queue
settings. Tested the commit in #1439 and it fixes the issue for me.

-Nathan

On Tue, Mar 08, 2016 at 03:57:39PM +0900, Gilles Gouaillardet wrote:
>Per the error message, can you try to
> 
>mpirun --mca btl_openib_if_include cxgb3_0 --mca btl_openib_max_send_size
>65536 ...
> 
>and see whether it helps ?
> 
>you can also try various settings for the receive queue, for example edit
>your /.../share/openmpi/mca-btl-openib-device-params.ini and set the
>parameters for your specific hardware
> 
>Cheers,
> 
>Gilles
> 
>On 3/8/2016 2:55 PM, dpchoudh . wrote:
> 
>  Hello all
> 
>  I am asking for help for the following situation:
> 
>  I have two (mostly identical) nodes. Each of them have (completely
>  identical)
>  1. qlogic 4x DDR infiniband, AND
>  2. Chelsio S310E (T3 chip based) 10GE iWARP cards.
> 
>  Both are connected back-to-back, without a switch. The connection is
>  physically OK and IP traffic can flow on both of them without issues.
> 
>  The issue is, I can run MPI programs using the openib BTL using the
>  qlogic card, but not the Chelsio card. Here are the commands:
> 
>  [durga@smallMPI ~]$ ibv_devices
>  device node GUID
>  --  
>  cxgb3_0 00074306cd3b  <-- Chelsio
>  qib0001175ff831d   <-- Qlogic
> 
>  The following command works:
> 
>   mpirun -np 2 --hostfile ~/hostfile -mca btl_openib_if_include qib0
>  ./osu_acc_latency
> 
>  And the following do not:
>  mpirun -np 2 --hostfile ~/hostfile -mca btl_openib_if_include cxgb3_0
>  ./osu_acc_latency
> 
>  mpirun -np 2 --hostfile ~/hostfile -mca pml ob1 -mca
>  btl_openib_if_include cxgb3_0 ./osu_acc_latency
> 
>  mpirun -np 2 --hostfile ~/hostfile -mca pml ^cm -mca
>  btl_openib_if_include cxgb3_0 ./osu_acc_latency
> 
>  The error I get is the following (in all of the non-working cases):
> 
>  WARNING: The largest queue pair buffer size specified in the
>  btl_openib_receive_queues MCA parameter is smaller than the maximum
>  send size (i.e., the btl_openib_max_send_size MCA parameter), meaning
>  that no queue is large enough to receive the largest possible incoming
>  message fragment.  The OpenFabrics (openib) BTL will therefore be
>  deactivated for this run.
> 
>Local host: smallMPI
>Largest buffer size: 65536
>Maximum send fragment size: 131072
>  
> --
>  
> --
>  No OpenFabrics connection schemes reported that they were able to be
>  used on a specific port.  As such, the openib BTL (OpenFabrics
>  support) will be disabled for this port.
> 
>Local host:   bigMPI
>Local device: cxgb3_0
>Local port:   1
>CPCs attempted:   udcm
>  
> --
> 
>  I have a vague understanding of what the message is trying to say, but I
>  do not know which file or configuration parameters to change to fix the
>  situation.
> 
>  Thanks in advance
>  Durga
>  Life is complex. It has real and imaginary parts.
> 
>  ___
>  users mailing list
>  us...@open-mpi.org
>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>  Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28657.php

> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28658.php



pgpP5SD5OhdXZ.pgp
Description: PGP signature


Re: [OMPI users] Poor performance on Amazon EC2 with TCP

2016-03-08 Thread Jackson, Gary L.
Nope, just one ethernet interface:

$ ifconfig
eth0  Link encap:Ethernet  HWaddr 0E:47:0E:0B:59:27
  inet addr:xxx.xxx.xxx.xxx  Bcast:xxx.xxx.xxx.xxx
Mask:255.255.252.0
  inet6 addr: fe80::c47:eff:fe0b:5927/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
  RX packets:16962 errors:0 dropped:0 overruns:0 frame:0
  TX packets:11564 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:28613867 (27.2 MiB)  TX bytes:1092650 (1.0 MiB)

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:65536  Metric:1
  RX packets:68 errors:0 dropped:0 overruns:0 frame:0
  TX packets:68 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:6647 (6.4 KiB)  TX bytes:6647 (6.4 KiB)


-- 
Gary Jackson




From:  users  on behalf of Gilles Gouaillardet

Reply-To:  Open MPI Users 
List-Post: users@lists.open-mpi.org
Date:  Tuesday, March 8, 2016 at 9:39 AM
To:  Open MPI Users 
Subject:  Re: [OMPI users] Poor performance on Amazon EC2 with TCP


Jason,

how many Ethernet interfaces are there ?
if several, can you try again with one only
mpirun --mca btl_tcp_if_include eth0 ...

Cheers,

Gilles

On Tuesday, March 8, 2016, Jackson, Gary L. 
wrote:


I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about
half the performance for MPI over TCP as I do with raw TCP. Before I start
digging in to this more deeply, does anyone know what might cause that?

For what it's worth, I see the same issues with MPICH, but I do not see it
with Intel MPI.

-- 
Gary Jackson



Re: [OMPI users] Poor performance on Amazon EC2 with TCP

2016-03-08 Thread Gilles Gouaillardet
Jackson,

i am surprised with the MTU value ...
IIRC, MTU for ethernet jumbo frame is 9000, not 9001.

can you run tracepath on both boxes (to check which mtu is used) ?

then, can you try to set MTU=1500 on both boxes
(warning, get ready to lose the connection) and try again with
openmpi and intelmpi ?
then, can you increase mtu to 6000 and then 9000 and see how things evolve ?

also, did you configure OpenMPI with IPv6 support ?


Cheers,

Gilles

On Wed, Mar 9, 2016 at 9:48 AM, Jackson, Gary L.
 wrote:
> Nope, just one ethernet interface:
>
> $ ifconfig
> eth0  Link encap:Ethernet  HWaddr 0E:47:0E:0B:59:27
>   inet addr:xxx.xxx.xxx.xxx  Bcast:xxx.xxx.xxx.xxx
> Mask:255.255.252.0
>   inet6 addr: fe80::c47:eff:fe0b:5927/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
>   RX packets:16962 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:11564 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:28613867 (27.2 MiB)  TX bytes:1092650 (1.0 MiB)
>
> loLink encap:Local Loopback
>   inet addr:127.0.0.1  Mask:255.0.0.0
>   inet6 addr: ::1/128 Scope:Host
>   UP LOOPBACK RUNNING  MTU:65536  Metric:1
>   RX packets:68 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:68 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:0
>   RX bytes:6647 (6.4 KiB)  TX bytes:6647 (6.4 KiB)
>
>
> --
> Gary Jackson
>
>
>
>
> From:  users  on behalf of Gilles Gouaillardet
> 
> Reply-To:  Open MPI Users 
> Date:  Tuesday, March 8, 2016 at 9:39 AM
> To:  Open MPI Users 
> Subject:  Re: [OMPI users] Poor performance on Amazon EC2 with TCP
>
>
> Jason,
>
> how many Ethernet interfaces are there ?
> if several, can you try again with one only
> mpirun --mca btl_tcp_if_include eth0 ...
>
> Cheers,
>
> Gilles
>
> On Tuesday, March 8, 2016, Jackson, Gary L. 
> wrote:
>
>
> I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about
> half the performance for MPI over TCP as I do with raw TCP. Before I start
> digging in to this more deeply, does anyone know what might cause that?
>
> For what it's worth, I see the same issues with MPICH, but I do not see it
> with Intel MPI.
>
> --
> Gary Jackson
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28663.php


Re: [OMPI users] Poor performance on Amazon EC2 with TCP

2016-03-08 Thread Rayson Ho
If you are using instance types that support SR-IOV (aka. "enhanced
networking" in AWS), then turn it on. We saw huge differences when SR-IOV
is enabled

http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html
http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html

Make sure you start your instances with a placement group -- otherwise, the
instances can be data centers apart!

And check that jumbo frames are enabled properly:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html

But still, it is interesting that Intel MPI is getting a 2X speedup with
the same setup! Can you post the raw numbers so that we can take a deeper
look??

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html




On Tue, Mar 8, 2016 at 9:08 AM, Jackson, Gary L. 
wrote:

>
> I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about
> half the performance for MPI over TCP as I do with raw TCP. Before I start
> digging in to this more deeply, does anyone know what might cause that?
>
> For what it's worth, I see the same issues with MPICH, but I do not see it
> with Intel MPI.
>
> --
> Gary Jackson
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28659.php
>


Re: [OMPI users] Poor performance on Amazon EC2 with TCP

2016-03-08 Thread Gilles Gouaillardet

Jackson,

one more thing, how did you build openmpi ?

if you built from git (and without VPATH), then --enable-debug is 
automatically set, and this is hurting performance.
if not already done, i recommend you download the latest openmpi tarball 
(1.10.2) and

./configure --with-platform=contrib/platform/optimized --prefix=...
last but not least, you can
mpirun --mca mpi_leave_pinned 1 
(that being said, i am not sure this is useful with TCP networks ...)

Cheers,

Gilles



On 3/9/2016 11:34 AM, Rayson Ho wrote:
If you are using instance types that support SR-IOV (aka. "enhanced 
networking" in AWS), then turn it on. We saw huge differences when 
SR-IOV is enabled


http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html
http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html

Make sure you start your instances with a placement group -- 
otherwise, the instances can be data centers apart!


And check that jumbo frames are enabled properly:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html

But still, it is interesting that Intel MPI is getting a 2X speedup 
with the same setup! Can you post the raw numbers so that we can take 
a deeper look??


Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html




On Tue, Mar 8, 2016 at 9:08 AM, Jackson, Gary L. 
mailto:gary.jack...@jhuapl.edu>> wrote:



I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing
about half the performance for MPI over TCP as I do with raw TCP.
Before I start digging in to this more deeply, does anyone know
what might cause that?

For what it's worth, I see the same issues with MPICH, but I do
not see it with Intel MPI.

-- 
Gary Jackson



___
users mailing list
us...@open-mpi.org 
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/03/28659.php




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28665.php