> Fair enough Ralph! I was implicitly assuming a "build once / run everywhere" 
> use case, my bad for not making my assumption clear.
> If the container is built to run on a specific host, there are indeed other 
> options to achieve near native performances.
> 

Err...that isn't actually what I meant, nor what we did. You can, in fact, 
build a container that can "run everywhere" while still employing high-speed 
fabric support. What you do is:

* configure OMPI with all the fabrics enabled (or at least all the ones you 
care about)

* don't include the fabric drivers in your container. These can/will vary 
across deployments, especially those (like NVIDIA's) that involve kernel modules

* setup your container to mount specified external device driver locations onto 
the locations where you configured OMPI to find them. Sadly, this does violate 
the container boundary - but nobody has come up with another solution, and at 
least the violation is confined to just the device drivers. Typically, you 
specify the external locations that are to be mounted using an envar or some 
other mechanism appropriate to your container, and then include the relevant 
information when launching the containers.

When OMPI initializes, it will do its normal procedure of attempting to load 
each fabric's drivers, selecting the transports whose drivers it can load. 
NOTE: beginning with OMPI v5, you'll need to explicitly tell OMPI to build 
without statically linking in the fabric plugins or else this probably will 
fail.

At least one vendor now distributes OMPI containers preconfigured with their 
fabric support based on this method. So using a "generic" container doesn't 
mean you lose performance - in fact, our tests showed zero impact on 
performance using this method.

HTH
Ralph


Reply via email to