Hello,

I am trying to figure out how I can verify that the OpenMPI traffic is
actually being transmitted over my RoCE fabric connecting my cluster.  My
MPI job runs quickly and error free but I cannot seem to verify that
significant amounts of data is being transferred to the other endpoint in my
RoCE fabric.  I am able to see what I believe to be the oob data when I
remove the oob exclusion from my command when I analyze my RoCE interface
using the tools listed below.

Software:

*         CentOS 7.2

*         Open MPI 2.0.1

Command:

*         mpirun   --mca btl openib,self,sm --mca oob_tcp_if_exclude eth3
--mca btl_openib_receive_queues P,65536,120,64,32 --mca
btl_openib_cpc_include rdmacm -np 4 -hostfile mpi-hosts-ce
/usr/local/bin/IMB-MPI1

o   Eth3 is my RoCE interface

o   The 2 nodes involved RoCE interfaces are defined in my mpi-hosts-ce file

Ways I have looked to verify data transference:

*         Through the port counters on my RoCE switch

o   Sees data being sent when using ib_write_bw but not when using Open MPI

*         Through ibdump

o   Sees data being sent when using ib_write_bw but not when using Open MPI

*         Through Wireshark

o   Sees data being sent when using ib_write_bw but not when using Open MPI

 

I do not have much experience with Open MPI and apologize if I have left out
necessary information.  I will respond with any data requested.  I
appreciate the time spent to read and respond to this.

 

 

Thank you,

 

Brendan T. W. Myers

brendan.my...@soft-forge.com <mailto:brendan.my...@soft-forge.com> 

Software Forge Inc

 

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to