There is some stuff in /dev, and also in /sys
on my system :
ls -al /dev/infiniband/
drwxr-xr-x 2 root root 120 Nov 10 17:08 .
drwxr-xr-x 21 root root 3980 Dec 13 03:09 ..
crw-rw---- 1 root root 231, 64 Nov 10 17:08 issm0
crw-rw-rw- 1 root root 10, 56 Nov 10 17:08 rdma_cm
crw-rw---- 1 root root 231, 0 Nov 10 17:08 umad0
crw-rw-rw- 1 root root 231, 192 Nov 10 17:08 uverbs0
here is what you can do to find out what is going wrong on your system
/* note if you are running selinux, that might also cause some issue */
$ mpirun -np 1 strace -e open,stat -o /tmp/hello.strace -- ./hello_c
Hello, world, I am 0 of 1, (Open MPI v3.0.0a1, package: Open MPI
gilles@xxx Distribution, ident: 3.0.0a1, repo rev: dev-3197-g4323016,
Unreleased developer copy, 160)
$ grep -v ENOENT /tmp/hello.strace | grep /dev/
open("/dev/shm/open_mpi.0000",
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = 6
open("/dev/infiniband/uverbs0", O_RDWR) = 17
open("/dev/infiniband/uverbs0", O_RDWR) = 19
open("/dev/infiniband/rdma_cm", O_RDWR) = 21
open("/dev/infiniband/rdma_cm", O_RDWR) = 21
open("/dev/infiniband/rdma_cm", O_RDWR) = 21
open("/dev/infiniband/rdma_cm", O_RDWR) = 21
$ grep -v ENOENT /tmp/hello.strace | grep /sys/
open("/sys/devices/system/cpu/possible", O_RDONLY) = 18
stat("/sys/class/infiniband", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
open("/sys/class/infiniband", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC)
= 17
open("/sys/class/infiniband_verbs/abi_version", O_RDONLY) = 17
open("/sys/class/infiniband_verbs",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 17
stat("/sys/class/infiniband_verbs/abi_version", {st_mode=S_IFREG|0444,
st_size=4096, ...}) = 0
stat("/sys/class/infiniband_verbs/uverbs0", {st_mode=S_IFDIR|0755,
st_size=0, ...}) = 0
open("/sys/class/infiniband_verbs/uverbs0/ibdev", O_RDONLY) = 18
open("/sys/class/infiniband_verbs/uverbs0/abi_version", O_RDONLY) = 18
open("/sys/class/infiniband_verbs/uverbs0/device/vendor", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/device", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/vendor", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/device", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/vendor", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/device", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/vendor", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/device", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/vendor", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/device", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/vendor", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/device", O_RDONLY) = 17
open("/sys/class/infiniband/mlx4_0/node_type", O_RDONLY) = 17
open("/sys/class/infiniband/mlx4_0/device/local_cpus", O_RDONLY) = 17
open("/sys/class/infiniband/mlx4_0/ports/1/gids/0", O_RDONLY) = 19
open("/sys/class/misc/rdma_cm/abi_version", O_RDONLY) = 19
open("/sys/class/infiniband/mlx4_0/node_guid", O_RDONLY) = 19
Cheers,
Gilles
On 12/18/2015 9:11 AM, Ralph Castain wrote:
To be honest, it’s been a very long time since I had an IB machine.
Howard, Nathan, or someone who has one - can you answer?
On Dec 17, 2015, at 3:53 PM, Bathke, Chuck <bat...@lanl.gov
<mailto:bat...@lanl.gov>> wrote:
Ralph,
Where would these be, in /dev?
Chuck
*From*: Ralph Castain [mailto:r...@open-mpi.org]
*Sent*: Thursday, December 17, 2015 04:13 PM
*To*: Open MPI Users <us...@open-mpi.org <mailto:us...@open-mpi.org>>
*Subject*: Re: [OMPI users] Need help resolving "error obtaining
device context for mlx4_0"
You might want to check the permissions on the MLX device directory -
according to that error message, the permissions are preventing you
from accessing the device. Without getting access, we don’t have a
way to communicate across nodes - you can run on one node using
shared memory, but not multiple nodes.
So it looks like there is some device-level permissions issue in play.
On Dec 17, 2015, at 2:39 PM, Bathke, Chuck <bat...@lanl.gov
<mailto:bat...@lanl.gov>> wrote:
Hi,
I have a system of AMD blades that I am trying to run MCNP6 on
using OPENMPI. I installed openmpi-1.6.5. I also have installed
Intel FORTRAN and C compiles. I compiled MCNP6 using FC="mpif90"
CC="mpicc" … It runs just fine when I run it on a 1-hour test case
on just one blade. I need to run it on several blades, but it issues
an error and crashes and burns. I have sought help here, but no one
seems to know how to fix it. I have mounted /opt and /home on bud
and bud6 on the corresponding /opt and /home on bud4, at their
suggestion. That did not fix anything. Please look at the attached
file (created with bud4>tar -zcf info.tgz mpihT3) that holds the
data that is requested athttps://www.open-mpi.org/community/help/and
in bullet 13 onhttps://www.open-mpi.org/community/help/. Can you
look at it and suggest a solution? I suspect that it is something
trivial that does not stand out and say, “look here you idiot.” Thanks.
Charles "Chuck" Bathke
MS-C921
Los Alamos National Laboratory
P.O. Box 1663
Los Alamos, NM 87545
Phone:(505)667-7214
Cell:(505)695-5709
Fax: 505-665-2897
Location: TA-16, Building 0200, Room 125
NEN-5 Group Office: 505-667-0914
<info.tgz>_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2015/12/28178.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/12/28180.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/12/28181.php