>Anyway, /dev/hfi1_0 doesn't exist. Make sure you have the hfi1 module/driver loaded. In addition, please confirm the links are in active state on all the nodes `opainfo`
_MAC From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Howard Pritchard Sent: Thursday, December 08, 2016 9:23 AM To: Open MPI Users <users@lists.open-mpi.org> Subject: Re: [OMPI users] device failed to appear .. Connection timed out hello Daniele, Could you post the output from ompi_info command? I'm noticing on the RPMS that came with the rhel7.2 distro on one of our systems that it was built to support psm2/hfi-1. Two things, could you try running applications with mpirun --mca pml ob1 (all the rest of your args) and see if that works? Second, what sort of system are you using? Is this a cluster? If it is, you may want to check whether you have a situation where its an omnipath interconnect and you have the psm2/hfi1 packages installed but for some reason the omnipath HCAs themselves are not active. On one of our omnipath systems the following hfi1 related pms are installed: hfidiags-0.8-13.x86_64 hfi1-psm-devel-0.7-244.x86_64 libhfi1verbs-0.5-16.el7.x86_64 hfi1-psm-0.7-244.x86_64 hfi1-firmware-0.9-36.noarch hfi1-psm-compat-0.7-244.x86_64 libhfi1verbs-devel-0.5-16.el7.x86_64 hfi1-0.11.3.10.0_327.el7.x86_64-245.x86_64 hfi1-firmware_debug-0.9-36.noarc hfi1-diagtools-sw-0.8-13.x86_64 Howard 2016-12-08 8:45 GMT-07:00 r...@open-mpi.org<mailto:r...@open-mpi.org> <r...@open-mpi.org<mailto:r...@open-mpi.org>>: Sounds like something didn’t quite get configured right, or maybe you have a library installed that isn’t quite setup correctly, or... Regardless, we generally advise building from source to avoid such problems. Is there some reason not to just do so? On Dec 8, 2016, at 6:16 AM, Daniele Tartarini <d.tartar...@sheffield.ac.uk<mailto:d.tartar...@sheffield.ac.uk>> wrote: Hi, I've installed on a Red Hat 7.2 the OpenMPI distributed via Yum: openmpi-devel.x86_64 1.10.3-3.el7 any code I try to run (including the mpitests-*) I get the following message with slight variants: my_machine.171619hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out Is anyone able to help me in identifying the source of the problem? Anyway, /dev/hfi1_0 doesn't exist. If I use an OpenMPI version compiled from source I have no issue (gcc 4.8.5). many thanks in advance. cheers Daniele _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users