Hi, many thanks for tour reply. I have a S2600IP Intel motherboard. it is a stand alone server and I cannot see any omnipath device and so not such modules. opainfo is not available on my system
missing anything? cheers Daniele On 8 December 2016 at 17:55, Cabral, Matias A <matias.a.cab...@intel.com> wrote: > >Anyway, * /dev/hfi1_0* doesn't exist. > > Make sure you have the hfi1 module/driver loaded. > > In addition, please confirm the links are in active state on all the nodes > `opainfo` > > > > _MAC > > > > *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of *Howard > Pritchard > *Sent:* Thursday, December 08, 2016 9:23 AM > *To:* Open MPI Users <users@lists.open-mpi.org> > *Subject:* Re: [OMPI users] device failed to appear .. Connection timed > out > > > > hello Daniele, > > > > Could you post the output from ompi_info command? I'm noticing on the > RPMS that came with the rhel7.2 distro on > > one of our systems that it was built to support psm2/hfi-1. > > > > Two things, could you try running applications with > > > > mpirun --mca pml ob1 (all the rest of your args) > > > > and see if that works? > > > > Second, what sort of system are you using? Is this a cluster? If it is, > you may want to check whether > > you have a situation where its an omnipath interconnect and you have the > psm2/hfi1 packages installed > > but for some reason the omnipath HCAs themselves are not active. > > > > On one of our omnipath systems the following hfi1 related pms are > installed: > > > > *hfi*diags-0.8-13.x86_64 > > *hfi*1-psm-devel-0.7-244.x86_64 > lib*hfi*1verbs-0.5-16.el7.x86_64 > *hfi*1-psm-0.7-244.x86_64 > *hfi*1-firmware-0.9-36.noarch > *hfi*1-psm-compat-0.7-244.x86_64 > lib*hfi*1verbs-devel-0.5-16.el7.x86_64 > *hfi*1-0.11.3.10.0_327.el7.x86_64-245.x86_64 > *hfi*1-firmware_debug-0.9-36.noarc > *hfi*1-diagtools-sw-0.8-13.x86_64 > > > > Howard > > > > 2016-12-08 8:45 GMT-07:00 r...@open-mpi.org <r...@open-mpi.org>: > > Sounds like something didn’t quite get configured right, or maybe you have > a library installed that isn’t quite setup correctly, or... > > > > Regardless, we generally advise building from source to avoid such > problems. Is there some reason not to just do so? > > > > On Dec 8, 2016, at 6:16 AM, Daniele Tartarini <d.tartar...@sheffield.ac.uk> > wrote: > > > > Hi, > > I've installed on a Red Hat 7.2 the OpenMPI distributed via Yum: > > * openmpi-devel.x86_64 1.10.3-3.el7 * > > > > any code I try to run (including the mpitests-*) I get the following > message with slight variants: > > > > * my_machine.171619hfi_wait_for_device: The /dev/hfi1_0 device > failed to appear after 15.0 seconds: Connection timed out* > > > > Is anyone able to help me in identifying the source of the problem? > > Anyway, * /dev/hfi1_0* doesn't exist. > > > > If I use an OpenMPI version compiled from source I have no issue (gcc > 4.8.5). > > > > many thanks in advance. > > > > cheers > > Daniele > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > -- -- Daniele Tartarini Post-Doctoral Research Associate Dept. Mechanical Engineering & INSIGNEO, institute for *in silico* medicine, University of Sheffield, Sheffield, UK linkedIn <http://uk.linkedin.com/in/danieletartarini>
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users