BTLs attempted: self That should only allow a single process to communicate with its self
On 19 May 2017 at 09:23, Gabriele Fatigati <g.fatig...@cineca.it> wrote: > Oh no, by using two procs: > > > findActiveDevices Error > We found no active IB device ports > findActiveDevices Error > We found no active IB device ports > -------------------------------------------------------------------------- > At least one pair of MPI processes are unable to reach each other for > MPI communications. This means that no Open MPI device has indicated > that it can be used to communicate between these processes. This is > an error; Open MPI requires that all MPI processes be able to reach > each other. This error can sometimes be the result of forgetting to > specify the "self" BTL. > > Process 1 ([[12380,1],0]) is on host: openpower > Process 2 ([[12380,1],1]) is on host: openpower > BTLs attempted: self > > Your MPI job is now going to abort; sorry. > -------------------------------------------------------------------------- > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > -------------------------------------------------------------------------- > MPI_INIT has failed because at least one MPI process is unreachable > from another. This *usually* means that an underlying communication > plugin -- such as a BTL or an MTL -- has either not loaded or not > allowed itself to be used. Your MPI job will now abort. > > You may wish to try to narrow down the problem; > * Check the output of ompi_info to see which BTL/MTL plugins are > available. > * Run your application with MPI_THREAD_SINGLE. > * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose, > if using MTL-based communications) to see exactly which > communication plugins were considered and/or discarded. > -------------------------------------------------------------------------- > [openpower:88867] 1 more process has sent help message help-mca-bml-r2.txt > / unreachable proc > [openpower:88867] Set MCA parameter "orte_base_help_aggregate" to 0 to see > all help / error messages > [openpower:88867] 1 more process has sent help message > help-mpi-runtime.txt / mpi_init:startup:pml-add-procs-fail > > > > > > 2017-05-19 9:22 GMT+02:00 Gabriele Fatigati <g.fatig...@cineca.it>: > >> Hi GIlles, >> >> using your command with one MPI procs I get: >> >> findActiveDevices Error >> We found no active IB device ports >> Hello world from rank 0 out of 1 processors >> >> So it seems to work apart the error message. >> >> >> 2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet <gil...@rist.or.jp>: >> >>> Gabriele, >>> >>> >>> so it seems pml/pami assumes there is an infiniband card available (!) >>> >>> i guess IBM folks will comment on that shortly. >>> >>> >>> meanwhile, you do not need pami since you are running on a single node >>> >>> mpirun --mca pml ^pami ... >>> >>> should do the trick >>> >>> (if it does not work, can run and post the logs) >>> >>> mpirun --mca pml ^pami --mca pml_base_verbose 100 ... >>> >>> >>> Cheers, >>> >>> >>> Gilles >>> >>> >>> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote: >>> >>>> Hi John, >>>> Infiniband is not used, there is a single node on this machine. >>>> >>>> 2017-05-19 8:50 GMT+02:00 John Hearns via users < >>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>>: >>>> >>>> Gabriele, pleae run 'ibv_devinfo' >>>> It looks to me like you may have the physical interface cards in >>>> these systems, but you do not have the correct drivers or >>>> libraries loaded. >>>> >>>> I have had similar messages when using Infiniband on x86 systems - >>>> which did not have libibverbs installed. >>>> >>>> >>>> On 19 May 2017 at 08:41, Gabriele Fatigati <g.fatig...@cineca.it >>>> <mailto:g.fatig...@cineca.it>> wrote: >>>> >>>> Hi Gilles, using your command: >>>> >>>> [openpower:88536] mca: base: components_register: registering >>>> framework pml components >>>> [openpower:88536] mca: base: components_register: found loaded >>>> component pami >>>> [openpower:88536] mca: base: components_register: component >>>> pami register function successful >>>> [openpower:88536] mca: base: components_open: opening pml >>>> components >>>> [openpower:88536] mca: base: components_open: found loaded >>>> component pami >>>> [openpower:88536] mca: base: components_open: component pami >>>> open function successful >>>> [openpower:88536] select: initializing pml component pami >>>> findActiveDevices Error >>>> We found no active IB device ports >>>> [openpower:88536] select: init returned failure for component >>>> pami >>>> [openpower:88536] PML pami cannot be selected >>>> ------------------------------------------------------------ >>>> -------------- >>>> No components were able to be opened in the pml framework. >>>> >>>> This typically means that either no components of this type were >>>> installed, or none of the installed componnets can be loaded. >>>> Sometimes this means that shared libraries required by these >>>> components are unable to be found/loaded. >>>> >>>> Host: openpower >>>> Framework: pml >>>> ------------------------------------------------------------ >>>> -------------- >>>> >>>> >>>> 2017-05-19 7:03 GMT+02:00 Gilles Gouaillardet >>>> <gil...@rist.or.jp <mailto:gil...@rist.or.jp>>: >>>> >>>> Gabriele, >>>> >>>> >>>> pml/pami is here, at least according to ompi_info >>>> >>>> >>>> can you update your mpirun command like this >>>> >>>> mpirun --mca pml_base_verbose 100 .. >>>> >>>> >>>> and post the output ? >>>> >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> On 5/18/2017 10:41 PM, Gabriele Fatigati wrote: >>>> >>>> Hi Gilles, attached the requested info >>>> >>>> 2017-05-18 15:04 GMT+02:00 Gilles Gouaillardet >>>> <gilles.gouaillar...@gmail.com >>>> <mailto:gilles.gouaillar...@gmail.com> >>>> <mailto:gilles.gouaillar...@gmail.com >>>> <mailto:gilles.gouaillar...@gmail.com>>>: >>>> >>>> Gabriele, >>>> >>>> can you >>>> ompi_info --all | grep pml >>>> >>>> also, make sure there is nothing in your >>>> environment pointing to >>>> an other Open MPI install >>>> for example >>>> ldd a.out >>>> should only point to IBM libraries >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> >>>> On Thursday, May 18, 2017, Gabriele Fatigati >>>> <g.fatig...@cineca.it <mailto:g.fatig...@cineca.it> >>>> <mailto:g.fatig...@cineca.it >>>> >>>> <mailto:g.fatig...@cineca.it>>> wrote: >>>> >>>> Dear OpenMPI users and developers, I'm using >>>> IBM Spectrum MPI >>>> 10.1.0 based on OpenMPI, so I hope there are >>>> some MPI expert >>>> can help me to solve the problem. >>>> >>>> When I run a simple Hello World MPI program, I >>>> get the follow >>>> error message: >>>> >>>> >>>> A requested component was not found, or was >>>> unable to be >>>> opened. This >>>> means that this component is either not >>>> installed or is unable >>>> to be >>>> used on your system (e.g., sometimes this >>>> means that shared >>>> libraries >>>> that the component requires are unable to be >>>> found/loaded). Note that >>>> Open MPI stopped checking at the first >>>> component that it did >>>> not find. >>>> >>>> Host: openpower >>>> Framework: pml >>>> Component: pami >>>> ------------------------------ >>>> -------------------------------------------- >>>> ------------------------------ >>>> -------------------------------------------- >>>> It looks like MPI_INIT failed for some reason; >>>> your parallel >>>> process is >>>> likely to abort. There are many reasons that a >>>> parallel >>>> process can >>>> fail during MPI_INIT; some of which are due to >>>> configuration >>>> or environment >>>> problems. This failure appears to be an >>>> internal failure; >>>> here's some >>>> additional information (which may only be >>>> relevant to an Open MPI >>>> developer): >>>> >>>> mca_pml_base_open() failed >>>> --> Returned "Not found" (-13) instead of >>>> "Success" (0) >>>> ------------------------------ >>>> -------------------------------------------- >>>> *** An error occurred in MPI_Init >>>> *** on a NULL communicator >>>> *** MPI_ERRORS_ARE_FATAL (processes in this >>>> communicator will >>>> now abort, >>>> *** and potentially your MPI job) >>>> >>>> My sysadmin used official IBM Spectrum >>>> packages to install >>>> MPI, so It's quite strange that there are some >>>> components >>>> missing (pami). Any help? Thanks >>>> >>>> >>>> -- Ing. Gabriele Fatigati >>>> >>>> HPC specialist >>>> >>>> SuperComputing Applications and Innovation >>>> Department >>>> >>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) >>>> Italy >>>> >>>> www.cineca.it <http://www.cineca.it> >>>> <http://www.cineca.it> Tel: +39 >>>> 051 6171722 <tel:051%206171722> <tel:051%20617%201722> >>>> >>>> g.fatigati [AT] cineca.it <http://cineca.it> >>>> <http://cineca.it> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> <mailto:users@lists.open-mpi.org> >>>> <mailto:users@lists.open-mpi.org >>>> <mailto:users@lists.open-mpi.org>> >>>> https://rfd.newmexicoconsortiu >>>> m.org/mailman/listinfo/users >>>> <https://rfd.newmexicoconsorti >>>> um.org/mailman/listinfo/users> >>>> <https://rfd.newmexicoconsort >>>> ium.org/mailman/listinfo/users >>>> <https://rfd.newmexicoconsorti >>>> um.org/mailman/listinfo/users>> >>>> >>>> >>>> >>>> >>>> -- Ing. Gabriele Fatigati >>>> >>>> HPC specialist >>>> >>>> SuperComputing Applications and Innovation Department >>>> >>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy >>>> >>>> www.cineca.it <http://www.cineca.it> >>>> <http://www.cineca.it> Tel: +39 051 6171722 >>>> <tel:%2B39%20051%206171722> >>>> >>>> g.fatigati [AT] cineca.it <http://cineca.it> >>>> <http://cineca.it> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org <mailto:us...@lists.open-mpi.o >>>> rg> >>>> https://rfd.newmexicoconsortiu >>>> m.org/mailman/listinfo/users >>>> <https://rfd.newmexicoconsorti >>>> um.org/mailman/listinfo/users> >>>> >>>> >>>> >>>> >>>> >>>> -- Ing. Gabriele Fatigati >>>> >>>> HPC specialist >>>> >>>> SuperComputing Applications and Innovation Department >>>> >>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy >>>> >>>> www.cineca.it <http://www.cineca.it> Tel: >>>> +39 051 6171722 <tel:+39%20051%20617%201722> >>>> >>>> >>>> g.fatigati [AT] cineca.it <http://cineca.it> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>>> >>>> >>>> >>>> >>>> -- >>>> Ing. Gabriele Fatigati >>>> >>>> HPC specialist >>>> >>>> SuperComputing Applications and Innovation Department >>>> >>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy >>>> >>>> www.cineca.it <http://www.cineca.it> Tel: +39 051 6171722 >>>> >>>> g.fatigati [AT] cineca.it <http://cineca.it> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> >> >> >> >> -- >> Ing. Gabriele Fatigati >> >> HPC specialist >> >> SuperComputing Applications and Innovation Department >> >> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy >> >> www.cineca.it Tel: +39 051 6171722 >> <051%20617%201722> >> >> g.fatigati [AT] cineca.it >> > > > > -- > Ing. Gabriele Fatigati > > HPC specialist > > SuperComputing Applications and Innovation Department > > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy > > www.cineca.it Tel: +39 051 6171722 > <+39%20051%20617%201722> > > g.fatigati [AT] cineca.it > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users