Emmanuel -- Looks like the right people missed this when you posted; sorry about that!
We're tracking it now: https://github.com/open-mpi/ompi/issues/6976 On Sep 13, 2019, at 3:04 AM, Emmanuel Thomé via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: Hi, Thanks Jeff for your reply, and sorry for this late follow-up... On Sun, Aug 11, 2019 at 02:27:53PM -0700, Jeff Hammond wrote: openmpi-4.0.1 gives essentially the same results (similar files attached), but with various doubts on my part as to whether I've run this check correctly. Here are my doubts: - whether I should or not have an ucx build for an omnipath cluster (IIUC https://github.com/openucx/ucx/issues/750 is now fixed ?), UCX is not optimized for Omni Path. Don't use it. good. Does that mean that the information conveyed by this message is incomplete ? It's easy to misconstrue it as an invitation to enable ucx. -------------------------------------------------------------------------- By default, for Open MPI 4.0 and later, infiniband ports on a device are not used by default. The intent is to use UCX for these devices. You can override this policy by setting the btl_openib_allow_ib MCA parameter to true. Local host: node0 Local adapter: hfi1_0 Local port: 1 -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: There was an error initializing an OpenFabrics device. Local host: node0 Local device: hfi1_0 -------------------------------------------------------------------------- - which btl I should use (I understand that openib goes to deprecation and it complains unless I do --mca btl openib --mca btl_openib_allow_ib true ; fine. But then, which non-openib non-tcp btl should I use instead ?) OFI->PS2 and PSM2 are the right conduits for Omni Path. I assume you meant ofi->psm2 and psm2. I understand that --mca mtl ofi should be Right in that case, and that --mca mtl psm2 should be as well. Which unfortunately doesn't tell me much about pml and btl selection, if these happen to matter (pml certainly, based on my initial report). It sounds like Open-MPI doesn't properly support the maximum transfer size of PSM2. One way to work around this is to wrap your MPI collective calls and do <4G chunking yourself. I'm afraid that it's not a very satisfactory answer. Once I've spent some time diagnosing the issue, sure I could do that sort of kludge. But the path to discovering the issue is long-winded. I'd have been *MUCH* better off if openmpi spat at me a big loud error message (like it does for psm2). The fact that it silently omits copying some of my data with the mtl ofi is extremely annoying. Best, E. _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://lists.open-mpi.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com>
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users