> Fair enough Ralph! I was implicitly assuming a "build once / run everywhere" > use case, my bad for not making my assumption clear. > If the container is built to run on a specific host, there are indeed other > options to achieve near native performances. >
Err...that isn't actually what I meant, nor what we did. You can, in fact, build a container that can "run everywhere" while still employing high-speed fabric support. What you do is: * configure OMPI with all the fabrics enabled (or at least all the ones you care about) * don't include the fabric drivers in your container. These can/will vary across deployments, especially those (like NVIDIA's) that involve kernel modules * setup your container to mount specified external device driver locations onto the locations where you configured OMPI to find them. Sadly, this does violate the container boundary - but nobody has come up with another solution, and at least the violation is confined to just the device drivers. Typically, you specify the external locations that are to be mounted using an envar or some other mechanism appropriate to your container, and then include the relevant information when launching the containers. When OMPI initializes, it will do its normal procedure of attempting to load each fabric's drivers, selecting the transports whose drivers it can load. NOTE: beginning with OMPI v5, you'll need to explicitly tell OMPI to build without statically linking in the fabric plugins or else this probably will fail. At least one vendor now distributes OMPI containers preconfigured with their fabric support based on this method. So using a "generic" container doesn't mean you lose performance - in fact, our tests showed zero impact on performance using this method. HTH Ralph