Hello Howard, I was wondering if you have been able to look at this issue at all, or if anyone has any ideas on what to try next.
Thank you, Brendan From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Brendan Myers Sent: Tuesday, January 24, 2017 11:11 AM To: 'Open MPI Users' <users@lists.open-mpi.org> Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch Hello Howard, Here is the error output after building with debug enabled. These CX4 Mellanox cards view each port as a separate device and I am using port 1 on the card which is device mlx5_0. Thank you, Brendan From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Howard Pritchard Sent: Tuesday, January 24, 2017 8:21 AM To: Open MPI Users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch Hello Brendan, This helps some, but looks like we need more debug output. Could you build a debug version of Open MPI by adding --enable-debug to the config options and rerun the test with the breakout cable setup and keeping the --mca btl_base_verbose 100 command line option? Thanks Howard 2017-01-23 8:23 GMT-07:00 Brendan Myers <brendan.my...@soft-forge.com <mailto:brendan.my...@soft-forge.com> >: Hello Howard, Thank you for looking into this. Attached is the output you requested. Also, I am using Open MPI 2.0.1. Thank you, Brendan From: users [mailto:users-boun...@lists.open-mpi.org <mailto:users-boun...@lists.open-mpi.org> ] On Behalf Of Howard Pritchard Sent: Friday, January 20, 2017 6:35 PM To: Open MPI Users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch Hi Brendan I doubt this kind of config has gotten any testing with OMPI. Could you rerun with --mca btl_base_verbose 100 added to the command line and post the output to the list? Howard Brendan Myers <brendan.my...@soft-forge.com <mailto:brendan.my...@soft-forge.com> > schrieb am Fr. 20. Jan. 2017 um 15:04: Hello, I am attempting to get Open MPI to run over 2 nodes using a switch and a single breakout cable with this design: (100GbE)QSFP <----> 2x (50GbE)QSFP Hardware Layout: Breakout cable module A connects to switch (100GbE) Breakout cable module B1 connects to node 1 RoCE NIC (50GbE) Breakout cable module B2 connects to node 2 RoCE NIC (50GbE) Switch is Mellanox SN 2700 100GbE RoCE switch * I am able to pass RDMA traffic between the nodes with perftest (ib_write_bw) when using the breakout cable as the IC from both nodes to the switch. * When attempting to run a job using the breakout cable as the IC Open MPI aborts with failure to initialize open fabrics device errors. * If I replace the breakout cable with 2 standard QSFP cables the Open MPI job will complete correctly. This is the command I use, it works unless I attempt a run with the breakout cable used as IC: mpirun --mca btl openib,self,sm --mca btl_openib_receive_queues P,65536,120,64,32 --mca btl_openib_cpc_include rdmacm -hostfile mpi-hosts-ce /usr/local/bin/IMB-MPI1 If anyone has any idea as to why using a breakout cable is causing my jobs to fail please let me know. Thank you, Brendan T. W. Myers brendan.my...@soft-forge.com <mailto:brendan.my...@soft-forge.com> Software Forge Inc _______________________________________________ users mailing list users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users