Hi, I am currently working on a parallel app that shows some issues using MX/BTL (not MTL) with the current trunk version of OpenMPI.
Basically, for its communication the app needs to do a lot of random <= 8KB MPI_Isend()s which are polled away using MPI_Iprobe() and MPI_Recv(). The async send requests are put into a ring, currently 64 entries, from which they are MPI_Wait()ed. The thing is, this works perfectly fine using OpenMPI with both TCP and MX/MTL, but given a sufficient number of cpus (currently close to 96), the app hangs quite reproducibly at some phase when using the trunk's MX/BTL implementation. [As an aside, the reason for using the BTL here is that I am actually interested in experimenting with the app over multiple clusters, in mixed mode MX+TCP, which recently has become possible using the BTL. For that mixed version the issue also pops up.] As the same issue did not occur with the 1.2 released versions of OpenMPI, I started to do some digging with the trunk revisions. Since I had no clue where to begin, basically I did a binary search of revisions between early 12000 and the most recent one. It turned out that the issue started to arise at revision 12931, where (amongst others) the mca_btl_mx_module.super.btl_eager_limit and mca_btl_mx_module.super.btl_min_send_size were moved to 4K. If I change these back to the original values (just below 16K and 32K respectively), the problem goes away (in both r12931 as well as the very recent ones). Given the max 8K message size used, this indeed influences the low-level communication behavior. I am sure some OpenMPI developer knows what to do with the above:-) If you need more feedback from me, or want me to try alternative options or configs, just let me know. Regards, Kees Verstoep