Hi Open MPI Users and Developers, I would like to know your experience with the optional middleware and the corresponding Open MPI framework/components for recent Mellanox Infiniband HCA, especially concerning MXM, FCA (the latest versions bring HCOLL I think) and the related Open MPI framework/components such as the MTL/mxm, PML/yalla, the COLL/fca and COLL/hcoll.
Does MXM when used with MTL/mxm or PML/yalla really improve communication speed over the plain BTL/openib ? Especially since MXM allows matching message tags, I suppose that in addition to improving a little the usual latency/bandwidth metrics, it would increase the communication/computation overlap potential when used with non-blocking MPI calls since the adapter is more autonomous. I remember that with old Myrinet networks, the matching MX middleware for our application was way better than the earlier non-matching GM middleware. I guess it is the same thing now with Infiniband / OpenFabric networks. Matching middleware should therefore be better. Also concerning FCA and HCOLL, do they really improve the speed of the collective operations ? >From the Mellanox documentation I saw they are supposed to use hardware >broadcast and take into account the topology to favor the faster connections >between process located on the same nodes. I also saw in these documents that >recent version of FCA is able to perform the reduction operations on the HCA >itself, even the floating point ones. This should greatly improve the speed of >MPI_Allreduce() in our codes ! So for those lucky who have access to a recent well configured Mellanox Infiniband cluster with recent middleware and an Open MPI library well configured to take advantage of this, does it deliver its promises ? The only documentation/reports I could find on Internet on these subjects are from Mellanox in addition to this for PML/yalla and MTL/mxm (slide 32): https://www.open-mpi.org/papers/sc-2014/Open-MPI-SC14-BOF.pdf Thanks in advance, Martin Audet