On 10/19/2015 07:09 PM, Shamis, Pavel wrote:
Please see inline (marked with "Pasha >").

From: users <users-boun...@open-mpi.org <mailto:users-boun...@open-mpi.org>> on behalf of 
John Marshall <john.marsh...@ssc-spc.gc.ca <mailto:john.marsh...@ssc-spc.gc.ca>>
Reply-To: Open Users <us...@open-mpi.org <mailto:us...@open-mpi.org>>
Date: Monday, October 19, 2015 11:06 AM
To: Open Users <us...@open-mpi.org <mailto:us...@open-mpi.org>>
Subject: Re: [OMPI users] openib issue with 1.6.5 but not later releases


Further efforts have shown that if we add:

    export OMPI_MCA_btl_openib_if_include=<device>

where device corresponds to the IB interface (e.g., mlx4_14), then
our test does not fail (yet, anyways).

Pasha > This is a pretty clear indicator that each container sees more than a 
single device.
Can you run ibv_devinfo –V within container and see what happens ?

Yes. It shows info for many hca_id entries: mlx4_0 to mlx4_16.


So, is this setting required if there are multiple IB interfaces (as
when there are multiple eth interfaces)? What is curious is that
there is only one interface visible from the container. Does the
openib btl look deeper and find all that exist in the host?

Pasha > Not really. We use Verbs driver to fetch the list of devices on the 
"node"

Is there something about the openib implementations in 1.8 and
1.10 that may handle this differently since we do not set
OMPI_MCA_btl_openib_if_include but our tests seem to work? Or,
is it a fluke?

Pasha > I was not involved that much in 1.8 and 1.10 so it is a bit hard to 
comment.
I would suspect that this might be somehow related to the locality feature and 
openib btl selects and creates only one btl instance and ignores all the rest.

So if I understand correctly, we do not need to worry for 1.8 and 1.10.

Since it is possible to see many hca_id entries, even in the container, what
do we need to do under 1.6.5? Can we use a single mlx4_# (e.g., mlx4_0) for
all or do we need to select one based on the ib# interface? We expect to
run multiple containers on a single host where each container gets a
unique/dedicated ib# interface.

Thanks,
John


John


Best,
Pasha


_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/10/27896.php


--
*John Marshall*
High Performance Computing Support, Science, Operations
Shared Services Canada / Government of Canada
john.marsh...@ssc-spc.gc.ca / Tel: 514-421-4647

Soutien aux calculs haute performance, Science, Operations
Services partages Canada / Gouvernement du Canada
john.marsh...@ssc-spc.gc.ca / Tel: 514-421-4647

Reply via email to