>Anyway,  /dev/hfi1_0 doesn't exist.
Make sure you have the hfi1 module/driver loaded.
In addition, please confirm the links are in active state on all the nodes 
`opainfo`

_MAC

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Howard 
Pritchard
Sent: Thursday, December 08, 2016 9:23 AM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] device failed to appear .. Connection timed out

hello Daniele,

Could you post the output from ompi_info command?  I'm noticing on the RPMS 
that came with the rhel7.2 distro on
one of our systems that it was built to support psm2/hfi-1.

Two things, could you try running applications with

mpirun --mca pml ob1 (all the rest of your args)

and see if that works?

Second,  what sort of system are you using?  Is this a cluster?  If it is, you 
may want to check whether
you have a situation where its an omnipath interconnect and you have the 
psm2/hfi1 packages installed
but for some reason the omnipath HCAs themselves are not active.

On one of our omnipath systems the following hfi1 related pms are installed:

hfidiags-0.8-13.x86_64

hfi1-psm-devel-0.7-244.x86_64
libhfi1verbs-0.5-16.el7.x86_64
hfi1-psm-0.7-244.x86_64
hfi1-firmware-0.9-36.noarch
hfi1-psm-compat-0.7-244.x86_64
libhfi1verbs-devel-0.5-16.el7.x86_64
hfi1-0.11.3.10.0_327.el7.x86_64-245.x86_64
hfi1-firmware_debug-0.9-36.noarc
hfi1-diagtools-sw-0.8-13.x86_64



Howard

2016-12-08 8:45 GMT-07:00 r...@open-mpi.org<mailto:r...@open-mpi.org> 
<r...@open-mpi.org<mailto:r...@open-mpi.org>>:
Sounds like something didn’t quite get configured right, or maybe you have a 
library installed that isn’t quite setup correctly, or...

Regardless, we generally advise building from source to avoid such problems. Is 
there some reason not to just do so?

On Dec 8, 2016, at 6:16 AM, Daniele Tartarini 
<d.tartar...@sheffield.ac.uk<mailto:d.tartar...@sheffield.ac.uk>> wrote:

Hi,

I've installed on a Red Hat 7.2 the OpenMPI distributed via Yum:

        openmpi-devel.x86_64                 1.10.3-3.el7

any code I try to run (including the mpitests-*) I get the following message 
with slight variants:

         my_machine.171619hfi_wait_for_device: The /dev/hfi1_0 device failed to 
appear after 15.0 seconds: Connection timed out

Is anyone able to help me in identifying the source of the problem?
Anyway,  /dev/hfi1_0 doesn't exist.

If I use an OpenMPI version compiled from source I have no issue (gcc 4.8.5).

many thanks in advance.

cheers
Daniele
_______________________________________________
users mailing list
users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to