I am not sure I agree with that. (a) the original error message from Gabriele was quite clear - the MPI could not find an interface card which was up, so it would not run. (b) Nysal actually pointed out the solution which looks good - after reaidng the documentation.. use pami_noib (c) Having discussions like this helps us all to learn. I have made many stupid replies on this list, and looking at problems like this has helped me to learn.
On 19 May 2017 at 11:01, r...@open-mpi.org <r...@open-mpi.org> wrote: > If I might interject here before lots of time is wasted. Spectrum MPI is > an IBM -product- and is not free. What you are likely running into is that > their license manager is blocking you from running, albeit without a really > nice error message. I’m sure that’s something they are working on. > > If you really want to use Spectrum MPI, I suggest you contact them about > purchasing it. > > > On May 19, 2017, at 1:16 AM, Gabriele Fatigati <g.fatig...@cineca.it> > wrote: > > Hi Gilles, in attach the outpuf of: > > mpirun --mca btl_base_verbose 100 -np 2 ... > > 2017-05-19 9:43 GMT+02:00 Gilles Gouaillardet <gil...@rist.or.jp>: > >> Gabriele, >> >> >> can you >> >> mpirun --mca btl_base_verbose 100 -np 2 ... >> >> >> so we can figure out why nor sm nor vader is used ? >> >> >> Cheers, >> >> >> Gilles >> >> >> >> On 5/19/2017 4:23 PM, Gabriele Fatigati wrote: >> >>> Oh no, by using two procs: >>> >>> >>> findActiveDevices Error >>> We found no active IB device ports >>> findActiveDevices Error >>> We found no active IB device ports >>> ------------------------------------------------------------ >>> -------------- >>> At least one pair of MPI processes are unable to reach each other for >>> MPI communications. This means that no Open MPI device has indicated >>> that it can be used to communicate between these processes. This is >>> an error; Open MPI requires that all MPI processes be able to reach >>> each other. This error can sometimes be the result of forgetting to >>> specify the "self" BTL. >>> >>> Process 1 ([[12380,1],0]) is on host: openpower >>> Process 2 ([[12380,1],1]) is on host: openpower >>> BTLs attempted: self >>> >>> Your MPI job is now going to abort; sorry. >>> ------------------------------------------------------------ >>> -------------- >>> *** An error occurred in MPI_Init >>> *** on a NULL communicator >>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>> *** and potentially your MPI job) >>> *** An error occurred in MPI_Init >>> *** on a NULL communicator >>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>> *** and potentially your MPI job) >>> ------------------------------------------------------------ >>> -------------- >>> MPI_INIT has failed because at least one MPI process is unreachable >>> from another. This *usually* means that an underlying communication >>> plugin -- such as a BTL or an MTL -- has either not loaded or not >>> allowed itself to be used. Your MPI job will now abort. >>> >>> You may wish to try to narrow down the problem; >>> * Check the output of ompi_info to see which BTL/MTL plugins are >>> available. >>> * Run your application with MPI_THREAD_SINGLE. >>> * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose, >>> if using MTL-based communications) to see exactly which >>> communication plugins were considered and/or discarded. >>> ------------------------------------------------------------ >>> -------------- >>> [openpower:88867] 1 more process has sent help message >>> help-mca-bml-r2.txt / unreachable proc >>> [openpower:88867] Set MCA parameter "orte_base_help_aggregate" to 0 to >>> see all help / error messages >>> [openpower:88867] 1 more process has sent help message >>> help-mpi-runtime.txt / mpi_init:startup:pml-add-procs-fail >>> >>> >>> >>> >>> >>> 2017-05-19 9:22 GMT+02:00 Gabriele Fatigati <g.fatig...@cineca.it >>> <mailto:g.fatig...@cineca.it>>: >>> >>> Hi GIlles, >>> >>> using your command with one MPI procs I get: >>> >>> findActiveDevices Error >>> We found no active IB device ports >>> Hello world from rank 0 out of 1 processors >>> >>> So it seems to work apart the error message. >>> >>> >>> 2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet <gil...@rist.or.jp >>> <mailto:gil...@rist.or.jp>>: >>> >>> Gabriele, >>> >>> >>> so it seems pml/pami assumes there is an infiniband card >>> available (!) >>> >>> i guess IBM folks will comment on that shortly. >>> >>> >>> meanwhile, you do not need pami since you are running on a >>> single node >>> >>> mpirun --mca pml ^pami ... >>> >>> should do the trick >>> >>> (if it does not work, can run and post the logs) >>> >>> mpirun --mca pml ^pami --mca pml_base_verbose 100 ... >>> >>> >>> Cheers, >>> >>> >>> Gilles >>> >>> >>> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote: >>> >>> Hi John, >>> Infiniband is not used, there is a single node on this >>> machine. >>> >>> 2017-05-19 8:50 GMT+02:00 John Hearns via users >>> <users@lists.open-mpi.org >>> <mailto:users@lists.open-mpi.org> >>> <mailto:users@lists.open-mpi.org >>> <mailto:users@lists.open-mpi.org>>>: >>> >>> Gabriele, pleae run 'ibv_devinfo' >>> It looks to me like you may have the physical >>> interface cards in >>> these systems, but you do not have the correct drivers or >>> libraries loaded. >>> >>> I have had similar messages when using Infiniband on >>> x86 systems - >>> which did not have libibverbs installed. >>> >>> >>> On 19 May 2017 at 08:41, Gabriele Fatigati >>> <g.fatig...@cineca.it <mailto:g.fatig...@cineca.it> >>> <mailto:g.fatig...@cineca.it >>> <mailto:g.fatig...@cineca.it>>> wrote: >>> >>> Hi Gilles, using your command: >>> >>> [openpower:88536] mca: base: components_register: >>> registering >>> framework pml components >>> [openpower:88536] mca: base: components_register: >>> found loaded >>> component pami >>> [openpower:88536] mca: base: components_register: >>> component >>> pami register function successful >>> [openpower:88536] mca: base: components_open: >>> opening pml >>> components >>> [openpower:88536] mca: base: components_open: >>> found loaded >>> component pami >>> [openpower:88536] mca: base: components_open: >>> component pami >>> open function successful >>> [openpower:88536] select: initializing pml >>> component pami >>> findActiveDevices Error >>> We found no active IB device ports >>> [openpower:88536] select: init returned failure >>> for component pami >>> [openpower:88536] PML pami cannot be selected >>> ----------------------------- >>> --------------------------------------------- >>> No components were able to be opened in the pml >>> framework. >>> >>> This typically means that either no components of >>> this type were >>> installed, or none of the installed componnets can >>> be loaded. >>> Sometimes this means that shared libraries >>> required by these >>> components are unable to be found/loaded. >>> >>> Host: openpower >>> Framework: pml >>> ----------------------------- >>> --------------------------------------------- >>> >>> >>> 2017-05-19 7:03 GMT+02:00 Gilles Gouaillardet >>> <gil...@rist.or.jp <mailto:gil...@rist.or.jp> >>> <mailto:gil...@rist.or.jp <mailto:gil...@rist.or.jp>>>: >>> >>> >>> Gabriele, >>> >>> >>> pml/pami is here, at least according to ompi_info >>> >>> >>> can you update your mpirun command like this >>> >>> mpirun --mca pml_base_verbose 100 .. >>> >>> >>> and post the output ? >>> >>> >>> Cheers, >>> >>> Gilles >>> >>> On 5/18/2017 10:41 PM, Gabriele Fatigati wrote: >>> >>> Hi Gilles, attached the requested info >>> >>> 2017-05-18 15:04 GMT+02:00 Gilles >>> Gouaillardet >>> <gilles.gouaillar...@gmail.com >>> <mailto:gilles.gouaillar...@gmail.com> >>> <mailto:gilles.gouaillar...@gmail.com >>> <mailto:gilles.gouaillar...@gmail.com>> >>> <mailto:gilles.gouaillar...@gmail.com >>> <mailto:gilles.gouaillar...@gmail.com> >>> <mailto:gilles.gouaillar...@gmail.com >>> <mailto:gilles.gouaillar...@gmail.com>>>>: >>> >>> Gabriele, >>> >>> can you >>> ompi_info --all | grep pml >>> >>> also, make sure there is nothing in your >>> environment pointing to >>> an other Open MPI install >>> for example >>> ldd a.out >>> should only point to IBM libraries >>> >>> Cheers, >>> >>> Gilles >>> >>> >>> On Thursday, May 18, 2017, Gabriele >>> Fatigati >>> <g.fatig...@cineca.it >>> <mailto:g.fatig...@cineca.it> <mailto:g.fatig...@cineca.it >>> <mailto:g.fatig...@cineca.it>> >>> <mailto:g.fatig...@cineca.it >>> <mailto:g.fatig...@cineca.it> >>> >>> <mailto:g.fatig...@cineca.it >>> <mailto:g.fatig...@cineca.it>>>> wrote: >>> >>> Dear OpenMPI users and developers, >>> I'm using >>> IBM Spectrum MPI >>> 10.1.0 based on OpenMPI, so I hope >>> there are >>> some MPI expert >>> can help me to solve the problem. >>> >>> When I run a simple Hello World >>> MPI program, I >>> get the follow >>> error message: >>> >>> >>> A requested component was not >>> found, or was >>> unable to be >>> opened. This >>> means that this component is >>> either not >>> installed or is unable >>> to be >>> used on your system (e.g., >>> sometimes this >>> means that shared >>> libraries >>> that the component requires are >>> unable to be >>> found/loaded). Note that >>> Open MPI stopped checking at the >>> first >>> component that it did >>> not find. >>> >>> Host: openpower >>> Framework: pml >>> Component: pami >>> ------------------------------------------------------------ >>> -------------- >>> ------------------------------------------------------------ >>> -------------- >>> It looks like MPI_INIT failed for >>> some reason; >>> your parallel >>> process is >>> likely to abort. There are many >>> reasons that a >>> parallel >>> process can >>> fail during MPI_INIT; some of >>> which are due to >>> configuration >>> or environment >>> problems. This failure appears to >>> be an >>> internal failure; >>> here's some >>> additional information (which may >>> only be >>> relevant to an Open MPI >>> developer): >>> >>> mca_pml_base_open() failed >>> --> Returned "Not found" (-13) >>> instead of >>> "Success" (0) >>> ------------------------------------------------------------ >>> -------------- >>> *** An error occurred in MPI_Init >>> *** on a NULL communicator >>> *** MPI_ERRORS_ARE_FATAL >>> (processes in this >>> communicator will >>> now abort, >>> *** and potentially your MPI job) >>> >>> My sysadmin used official IBM >>> Spectrum >>> packages to install >>> MPI, so It's quite strange that >>> there are some >>> components >>> missing (pami). Any help? Thanks >>> >>> >>> -- Ing. Gabriele Fatigati >>> >>> HPC specialist >>> >>> SuperComputing Applications and >>> Innovation >>> Department >>> >>> Via Magnanelli 6/3, Casalecchio di >>> Reno (BO) Italy >>> >>> www.cineca.it <http://www.cineca.it> <http://www.cineca.it> >>> <http://www.cineca.it> Tel: +39 >>> 051 6171722 <tel:051%206171722 <051%206171722>> < >>> tel:051%206171722 <051%206171722>> >>> >>> <tel:051%20617%201722 <051%20617%201722>> >>> >>> g.fatigati [AT] cineca.it >>> <http://cineca.it> <http://cineca.it> >>> <http://cineca.it> >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> <mailto:users@lists.open-mpi.org >>> <mailto:users@lists.open-mpi.org>> >>> <mailto:users@lists.open-mpi.org >>> <mailto:users@lists.open-mpi.org> >>> <mailto:users@lists.open-mpi.org >>> <mailto:users@lists.open-mpi.org>>> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> <https://rfd.newmexicoconsort >>> ium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> > >>> < >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> <https://rfd.newmexicoconsort >>> ium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> >> >>> >>> >>> >>> >>> -- Ing. Gabriele Fatigati >>> >>> HPC specialist >>> >>> SuperComputing Applications and Innovation >>> Department >>> >>> Via Magnanelli 6/3, Casalecchio di Reno >>> (BO) Italy >>> >>> www.cineca.it <http://www.cineca.it> <http://www.cineca.it> >>> <http://www.cineca.it> Tel: +39 051 >>> 6171722 <tel:%2B39%20051%206171722 <%2B39%20051%206171722>> >>> <tel:%2B39%20051%206171722 <%2B39%20051%206171722>> >>> >>> g.fatigati [AT] cineca.it >>> <http://cineca.it> <http://cineca.it> >>> <http://cineca.it> >>> >>> >>> _____________________________ >>> __________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> <mailto:users@lists.open-mpi.org >>> <mailto:users@lists.open-mpi.org>> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> <https://rfd.newmexicoconsort >>> ium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> > >>> >>> >>> >>> >>> >>> -- Ing. Gabriele Fatigati >>> >>> HPC specialist >>> >>> SuperComputing Applications and Innovation Department >>> >>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy >>> >>> www.cineca.it <http://www.cineca.it> >>> <http://www.cineca.it> Tel: >>> +39 051 6171722 <tel:%2B39%20051%206171722 >>> <%2B39%20051%206171722>> >>> <tel:+39%20051%20617%201722 <+39%20051%20617%201722>> >>> >>> >>> g.fatigati [AT] cineca.it <http://cineca.it> >>> <http://cineca.it> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> <mailto:users@lists.open-mpi.org >>> <mailto:users@lists.open-mpi.org>> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> <https://rfd.newmexicoconsort >>> ium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> > >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> <mailto:users@lists.open-mpi.org >>> <mailto:users@lists.open-mpi.org>> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> <https://rfd.newmexicoconsort >>> ium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> > >>> >>> >>> >>> >>> -- Ing. Gabriele Fatigati >>> >>> HPC specialist >>> >>> SuperComputing Applications and Innovation Department >>> >>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy >>> >>> www.cineca.it <http://www.cineca.it> >>> <http://www.cineca.it> Tel: +39 051 6171722 >>> <tel:%2B39%20051%206171722 <%2B39%20051%206171722>> >>> >>> g.fatigati [AT] cineca.it <http://cineca.it> >>> <http://cineca.it> >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> >>> >>> >>> >>> -- Ing. Gabriele Fatigati >>> >>> HPC specialist >>> >>> SuperComputing Applications and Innovation Department >>> >>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy >>> >>> www.cineca.it <http://www.cineca.it> Tel: +39 051 6171722 >>> <tel:051%20617%201722 <051%20617%201722>> >>> >>> g.fatigati [AT] cineca.it <http://cineca.it> >>> >>> >>> >>> >>> -- >>> Ing. Gabriele Fatigati >>> >>> HPC specialist >>> >>> SuperComputing Applications and Innovation Department >>> >>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy >>> >>> www.cineca.it <http://www.cineca.it> Tel: +39 051 6171722 >>> >>> g.fatigati [AT] cineca.it <http://cineca.it> >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> > > > > -- > Ing. Gabriele Fatigati > > HPC specialist > > SuperComputing Applications and Innovation Department > > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy > > www.cineca.it Tel: +39 051 6171722 > <+39%20051%20617%201722> > > g.fatigati [AT] cineca.it > <output_mpirun>_______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users