Hello Howard,

I was wondering if you have been able to look at this issue at all, or if 
anyone has any ideas on what to try next.

 

Thank you,

Brendan

 

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Brendan Myers
Sent: Tuesday, January 24, 2017 11:11 AM
To: 'Open MPI Users' <users@lists.open-mpi.org>
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hello Howard,

Here is the error output after building with debug enabled.  These CX4 Mellanox 
cards view each port as a separate device and I am using port 1 on the card 
which is device mlx5_0. 

 

Thank you,

Brendan

 

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Howard 
Pritchard
Sent: Tuesday, January 24, 2017 8:21 AM
To: Open MPI Users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hello Brendan,

 

This helps some, but looks like we need more debug output.

 

Could you build a debug version of Open MPI by adding --enable-debug

to the config options and rerun the test with the breakout cable setup

and keeping the --mca btl_base_verbose 100 command line option?

 

Thanks

 

Howard

 

 

2017-01-23 8:23 GMT-07:00 Brendan Myers <brendan.my...@soft-forge.com 
<mailto:brendan.my...@soft-forge.com> >:

Hello Howard,

Thank you for looking into this. Attached is the output you requested.  Also, I 
am using Open MPI 2.0.1.

 

Thank you,

Brendan

 

From: users [mailto:users-boun...@lists.open-mpi.org 
<mailto:users-boun...@lists.open-mpi.org> ] On Behalf Of Howard Pritchard
Sent: Friday, January 20, 2017 6:35 PM
To: Open MPI Users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hi Brendan

 

I doubt this kind of config has gotten any testing with OMPI.  Could you rerun 
with

 

--mca btl_base_verbose 100

 

added to the command line and post the output to the list?

 

Howard

 

 

Brendan Myers <brendan.my...@soft-forge.com 
<mailto:brendan.my...@soft-forge.com> > schrieb am Fr. 20. Jan. 2017 um 15:04:

Hello,

I am attempting to get Open MPI to run over 2 nodes using a switch and a single 
breakout cable with this design:

(100GbE)QSFP <----> 2x (50GbE)QSFP       

 

Hardware Layout:

Breakout cable module A connects to switch (100GbE)

Breakout cable module B1 connects to node 1 RoCE NIC (50GbE)

Breakout cable module B2 connects to node 2 RoCE NIC (50GbE)

Switch is Mellanox SN 2700 100GbE RoCE switch

 

*         I  am able to pass RDMA traffic between the nodes with perftest 
(ib_write_bw) when using the breakout cable as the IC from both nodes to the 
switch.

*         When attempting to run a job using the breakout cable as the IC Open 
MPI aborts with failure to initialize open fabrics device errors.

*         If I replace the breakout cable with 2 standard QSFP cables the Open 
MPI job will complete correctly.  

 

 

This is the command I use, it works unless I attempt a run with the breakout 
cable used as IC:

mpirun --mca btl openib,self,sm --mca btl_openib_receive_queues 
P,65536,120,64,32 --mca btl_openib_cpc_include rdmacm  -hostfile mpi-hosts-ce 
/usr/local/bin/IMB-MPI1

 

If anyone has any idea as to why using a breakout cable is causing my jobs to 
fail please let me know.

 

Thank you,

 

Brendan T. W. Myers

brendan.my...@soft-forge.com <mailto:brendan.my...@soft-forge.com> 

Software Forge Inc

 

_______________________________________________

users mailing list

users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> 

https://rfd.newmexicoconsortium.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> 
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

 

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to