Hello, I am trying to figure out how I can verify that the OpenMPI traffic is actually being transmitted over my RoCE fabric connecting my cluster. My MPI job runs quickly and error free but I cannot seem to verify that significant amounts of data is being transferred to the other endpoint in my RoCE fabric. I am able to see what I believe to be the oob data when I remove the oob exclusion from my command when I analyze my RoCE interface using the tools listed below.
Software: * CentOS 7.2 * Open MPI 2.0.1 Command: * mpirun --mca btl openib,self,sm --mca oob_tcp_if_exclude eth3 --mca btl_openib_receive_queues P,65536,120,64,32 --mca btl_openib_cpc_include rdmacm -np 4 -hostfile mpi-hosts-ce /usr/local/bin/IMB-MPI1 o Eth3 is my RoCE interface o The 2 nodes involved RoCE interfaces are defined in my mpi-hosts-ce file Ways I have looked to verify data transference: * Through the port counters on my RoCE switch o Sees data being sent when using ib_write_bw but not when using Open MPI * Through ibdump o Sees data being sent when using ib_write_bw but not when using Open MPI * Through Wireshark o Sees data being sent when using ib_write_bw but not when using Open MPI I do not have much experience with Open MPI and apologize if I have left out necessary information. I will respond with any data requested. I appreciate the time spent to read and respond to this. Thank you, Brendan T. W. Myers brendan.my...@soft-forge.com <mailto:brendan.my...@soft-forge.com> Software Forge Inc
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users