ps. One take away for everyone working with MPI.
Turn up the error logging or debug level.
then PAY ATTENTION to the error messages.

I have spent a LOT of my time doing just that - with OpenMPI and with Intel
MPI over Omnipath and other interconnects in the dim and distant past.
The guy or girl who wrote the software did not put that error trap in th
ecode for a laugh. It took effort, therefore pay attention to it.
Even if it seems stupid to you, or goes contrary to what you "know" is true
about the system, pay it some attention.

My own recent story is about Omnipath - I knew that the devices were
physically up, I could run the disagnostics etc.
But the particular MPI program failed to start - running ibv_devinfo
eventually led me to find that the ibverbs library was not installed.
I am not flagging this up as a particular example to be teased apart - just
as a general case.
supercomputer cluster running over high performance fabrics are complicated
beasts.  Itis not sufficient to plug in cards and cable.




















On 19 May 2017 at 11:12, John Hearns <hear...@googlemail.com> wrote:

> I am not sure I agree with that.
> (a) the original error message from Gabriele was quite clear - the MPI
> could not find an interface card which was up, so it would not run.
> (b) Nysal actually pointed out the solution which looks good - after
> reaidng the documentation.. use  pami_noib
> (c) Having discussions like this helps us all to learn. I have made many
> stupid replies on this list, and looking at problems like this has helped
> me to learn.
>
>
>
>
> On 19 May 2017 at 11:01, r...@open-mpi.org <r...@open-mpi.org> wrote:
>
>> If I might interject here before lots of time is wasted. Spectrum MPI is
>> an IBM -product- and is not free. What you are likely running into is that
>> their license manager is blocking you from running, albeit without a really
>> nice error message. I’m sure that’s something they are working on.
>>
>> If you really want to use Spectrum MPI, I suggest you contact them about
>> purchasing it.
>>
>>
>> On May 19, 2017, at 1:16 AM, Gabriele Fatigati <g.fatig...@cineca.it>
>> wrote:
>>
>> Hi Gilles, in attach the outpuf of:
>>
>> mpirun --mca btl_base_verbose 100 -np 2 ...
>>
>> 2017-05-19 9:43 GMT+02:00 Gilles Gouaillardet <gil...@rist.or.jp>:
>>
>>> Gabriele,
>>>
>>>
>>> can you
>>>
>>> mpirun --mca btl_base_verbose 100 -np 2 ...
>>>
>>>
>>> so we can figure out why nor sm nor vader is used ?
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Gilles
>>>
>>>
>>>
>>> On 5/19/2017 4:23 PM, Gabriele Fatigati wrote:
>>>
>>>> Oh no, by using two procs:
>>>>
>>>>
>>>> findActiveDevices Error
>>>> We found no active IB device ports
>>>> findActiveDevices Error
>>>> We found no active IB device ports
>>>> ------------------------------------------------------------
>>>> --------------
>>>> At least one pair of MPI processes are unable to reach each other for
>>>> MPI communications.  This means that no Open MPI device has indicated
>>>> that it can be used to communicate between these processes.  This is
>>>> an error; Open MPI requires that all MPI processes be able to reach
>>>> each other.  This error can sometimes be the result of forgetting to
>>>> specify the "self" BTL.
>>>>
>>>>   Process 1 ([[12380,1],0]) is on host: openpower
>>>>   Process 2 ([[12380,1],1]) is on host: openpower
>>>>   BTLs attempted: self
>>>>
>>>> Your MPI job is now going to abort; sorry.
>>>> ------------------------------------------------------------
>>>> --------------
>>>> *** An error occurred in MPI_Init
>>>> *** on a NULL communicator
>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>>> ***    and potentially your MPI job)
>>>> *** An error occurred in MPI_Init
>>>> *** on a NULL communicator
>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>>> ***    and potentially your MPI job)
>>>> ------------------------------------------------------------
>>>> --------------
>>>> MPI_INIT has failed because at least one MPI process is unreachable
>>>> from another.  This *usually* means that an underlying communication
>>>> plugin -- such as a BTL or an MTL -- has either not loaded or not
>>>> allowed itself to be used.  Your MPI job will now abort.
>>>>
>>>> You may wish to try to narrow down the problem;
>>>>  * Check the output of ompi_info to see which BTL/MTL plugins are
>>>>    available.
>>>>  * Run your application with MPI_THREAD_SINGLE.
>>>>  * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
>>>>    if using MTL-based communications) to see exactly which
>>>>    communication plugins were considered and/or discarded.
>>>> ------------------------------------------------------------
>>>> --------------
>>>> [openpower:88867] 1 more process has sent help message
>>>> help-mca-bml-r2.txt / unreachable proc
>>>> [openpower:88867] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>>> see all help / error messages
>>>> [openpower:88867] 1 more process has sent help message
>>>> help-mpi-runtime.txt / mpi_init:startup:pml-add-procs-fail
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2017-05-19 9:22 GMT+02:00 Gabriele Fatigati <g.fatig...@cineca.it
>>>> <mailto:g.fatig...@cineca.it>>:
>>>>
>>>>     Hi GIlles,
>>>>
>>>>     using your command with one MPI procs I get:
>>>>
>>>>     findActiveDevices Error
>>>>     We found no active IB device ports
>>>>     Hello world from rank 0  out of 1 processors
>>>>
>>>>     So it seems to work apart the error message.
>>>>
>>>>
>>>>     2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet <gil...@rist.or.jp
>>>>     <mailto:gil...@rist.or.jp>>:
>>>>
>>>>         Gabriele,
>>>>
>>>>
>>>>         so it seems pml/pami assumes there is an infiniband card
>>>>         available (!)
>>>>
>>>>         i guess IBM folks will comment on that shortly.
>>>>
>>>>
>>>>         meanwhile, you do not need pami since you are running on a
>>>>         single node
>>>>
>>>>         mpirun --mca pml ^pami ...
>>>>
>>>>         should do the trick
>>>>
>>>>         (if it does not work, can run and post the logs)
>>>>
>>>>         mpirun --mca pml ^pami --mca pml_base_verbose 100 ...
>>>>
>>>>
>>>>         Cheers,
>>>>
>>>>
>>>>         Gilles
>>>>
>>>>
>>>>         On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:
>>>>
>>>>             Hi John,
>>>>             Infiniband is not used, there is a single node on this
>>>>             machine.
>>>>
>>>>             2017-05-19 8:50 GMT+02:00 John Hearns via users
>>>>             <users@lists.open-mpi.org
>>>>             <mailto:users@lists.open-mpi.org>
>>>>             <mailto:users@lists.open-mpi.org
>>>>             <mailto:users@lists.open-mpi.org>>>:
>>>>
>>>>                 Gabriele,   pleae run  'ibv_devinfo'
>>>>                 It looks to me like you may have the physical
>>>>             interface cards in
>>>>                 these systems, but you do not have the correct drivers
>>>> or
>>>>                 libraries loaded.
>>>>
>>>>                 I have had similar messages when using Infiniband on
>>>>             x86 systems -
>>>>                 which did not have libibverbs installed.
>>>>
>>>>
>>>>                 On 19 May 2017 at 08:41, Gabriele Fatigati
>>>>             <g.fatig...@cineca.it <mailto:g.fatig...@cineca.it>
>>>>                 <mailto:g.fatig...@cineca.it
>>>>             <mailto:g.fatig...@cineca.it>>> wrote:
>>>>
>>>>                     Hi Gilles, using your command:
>>>>
>>>>                     [openpower:88536] mca: base: components_register:
>>>>             registering
>>>>                     framework pml components
>>>>                     [openpower:88536] mca: base: components_register:
>>>>             found loaded
>>>>                     component pami
>>>>                     [openpower:88536] mca: base: components_register:
>>>>             component
>>>>                     pami register function successful
>>>>                     [openpower:88536] mca: base: components_open:
>>>>             opening pml
>>>>                     components
>>>>                     [openpower:88536] mca: base: components_open:
>>>>             found loaded
>>>>                     component pami
>>>>                     [openpower:88536] mca: base: components_open:
>>>>             component pami
>>>>                     open function successful
>>>>                     [openpower:88536] select: initializing pml
>>>>             component pami
>>>>                     findActiveDevices Error
>>>>                     We found no active IB device ports
>>>>                     [openpower:88536] select: init returned failure
>>>>             for component pami
>>>>                     [openpower:88536] PML pami cannot be selected
>>>>                                -----------------------------
>>>> ---------------------------------------------
>>>>                     No components were able to be opened in the pml
>>>>             framework.
>>>>
>>>>                     This typically means that either no components of
>>>>             this type were
>>>>                     installed, or none of the installed componnets can
>>>>             be loaded.
>>>>                     Sometimes this means that shared libraries
>>>>             required by these
>>>>                     components are unable to be found/loaded.
>>>>
>>>>                       Host:      openpower
>>>>                       Framework: pml
>>>>                                -----------------------------
>>>> ---------------------------------------------
>>>>
>>>>
>>>>                     2017-05-19 7:03 GMT+02:00 Gilles Gouaillardet
>>>>                     <gil...@rist.or.jp <mailto:gil...@rist.or.jp>
>>>>             <mailto:gil...@rist.or.jp <mailto:gil...@rist.or.jp>>>:
>>>>
>>>>
>>>>                         Gabriele,
>>>>
>>>>
>>>>                         pml/pami is here, at least according to
>>>> ompi_info
>>>>
>>>>
>>>>                         can you update your mpirun command like this
>>>>
>>>>                         mpirun --mca pml_base_verbose 100 ..
>>>>
>>>>
>>>>                         and post the output ?
>>>>
>>>>
>>>>                         Cheers,
>>>>
>>>>                         Gilles
>>>>
>>>>                         On 5/18/2017 10:41 PM, Gabriele Fatigati wrote:
>>>>
>>>>                             Hi Gilles, attached the requested info
>>>>
>>>>                             2017-05-18 15:04 GMT+02:00 Gilles
>>>> Gouaillardet
>>>>                             <gilles.gouaillar...@gmail.com
>>>>             <mailto:gilles.gouaillar...@gmail.com>
>>>>                             <mailto:gilles.gouaillar...@gmail.com
>>>>             <mailto:gilles.gouaillar...@gmail.com>>
>>>>                             <mailto:gilles.gouaillar...@gmail.com
>>>>             <mailto:gilles.gouaillar...@gmail.com>
>>>>                             <mailto:gilles.gouaillar...@gmail.com
>>>>             <mailto:gilles.gouaillar...@gmail.com>>>>:
>>>>
>>>>                                 Gabriele,
>>>>
>>>>                                 can you
>>>>                                 ompi_info --all | grep pml
>>>>
>>>>                                 also, make sure there is nothing in your
>>>>                             environment pointing to
>>>>                                 an other Open MPI install
>>>>                                 for example
>>>>                                 ldd a.out
>>>>                                 should only point to IBM libraries
>>>>
>>>>                                 Cheers,
>>>>
>>>>                                 Gilles
>>>>
>>>>
>>>>                                 On Thursday, May 18, 2017, Gabriele
>>>>             Fatigati
>>>>                             <g.fatig...@cineca.it
>>>>             <mailto:g.fatig...@cineca.it> <mailto:g.fatig...@cineca.it
>>>>             <mailto:g.fatig...@cineca.it>>
>>>>                                 <mailto:g.fatig...@cineca.it
>>>>             <mailto:g.fatig...@cineca.it>
>>>>
>>>>                             <mailto:g.fatig...@cineca.it
>>>>             <mailto:g.fatig...@cineca.it>>>> wrote:
>>>>
>>>>                                     Dear OpenMPI users and developers,
>>>>             I'm using
>>>>                             IBM Spectrum MPI
>>>>                                     10.1.0 based on OpenMPI, so I hope
>>>>             there are
>>>>                             some MPI expert
>>>>                                     can help me to solve the problem.
>>>>
>>>>                                     When I run a simple Hello World
>>>>             MPI program, I
>>>>                             get the follow
>>>>                                     error message:
>>>>
>>>>
>>>>                                     A requested component was not
>>>>             found, or was
>>>>                             unable to be
>>>>                                     opened.  This
>>>>                                     means that this component is
>>>>             either not
>>>>                             installed or is unable
>>>>                                     to be
>>>>                                     used on your system (e.g.,
>>>>             sometimes this
>>>>                             means that shared
>>>>                                     libraries
>>>>                                     that the component requires are
>>>>             unable to be
>>>>                             found/loaded).         Note that
>>>>                                     Open MPI stopped checking at the
>>>> first
>>>>                             component that it did
>>>>                                     not find.
>>>>
>>>>                                     Host:      openpower
>>>>                                     Framework: pml
>>>>                                     Component: pami
>>>>             ------------------------------
>>>> --------------------------------------------
>>>>             ------------------------------
>>>> --------------------------------------------
>>>>                                     It looks like MPI_INIT failed for
>>>>             some reason;
>>>>                             your parallel
>>>>                                     process is
>>>>                                     likely to abort. There are many
>>>>             reasons that a
>>>>                             parallel
>>>>                                     process can
>>>>                                     fail during MPI_INIT; some of
>>>>             which are due to
>>>>                             configuration
>>>>                                     or environment
>>>>                                     problems.  This failure appears to
>>>>             be an
>>>>                             internal failure;
>>>>                                     here's some
>>>>                                     additional information (which may
>>>>             only be
>>>>                             relevant to an Open MPI
>>>>                                     developer):
>>>>
>>>>                                     mca_pml_base_open() failed
>>>>                                       --> Returned "Not found" (-13)
>>>>             instead of
>>>>                             "Success" (0)
>>>>             ------------------------------
>>>> --------------------------------------------
>>>>                                     *** An error occurred in MPI_Init
>>>>                                     *** on a NULL communicator
>>>>                                     *** MPI_ERRORS_ARE_FATAL
>>>>             (processes in this
>>>>                             communicator will
>>>>                                     now abort,
>>>>                                     ***    and potentially your MPI job)
>>>>
>>>>                                     My sysadmin used official IBM
>>>> Spectrum
>>>>                             packages to install
>>>>                                     MPI, so It's quite strange that
>>>>             there are some
>>>>                             components
>>>>                                     missing (pami). Any help? Thanks
>>>>
>>>>
>>>>                                     --         Ing. Gabriele Fatigati
>>>>
>>>>                                     HPC specialist
>>>>
>>>>                                     SuperComputing Applications and
>>>>             Innovation
>>>>                             Department
>>>>
>>>>                                     Via Magnanelli 6/3, Casalecchio di
>>>>             Reno (BO) Italy
>>>>
>>>>             www.cineca.it <http://www.cineca.it> <http://www.cineca.it>
>>>>                             <http://www.cineca.it>             Tel: +39
>>>>             051 6171722 <tel:051%206171722 <051%206171722>> <
>>>> tel:051%206171722 <051%206171722>>
>>>>
>>>>             <tel:051%20617%201722 <051%20617%201722>>
>>>>
>>>>                                     g.fatigati [AT] cineca.it
>>>>             <http://cineca.it> <http://cineca.it>
>>>>                             <http://cineca.it>
>>>>
>>>>
>>>>             _______________________________________________
>>>>                                 users mailing list
>>>>             users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>                             <mailto:users@lists.open-mpi.org
>>>>             <mailto:users@lists.open-mpi.org>>
>>>>                             <mailto:users@lists.open-mpi.org
>>>>             <mailto:users@lists.open-mpi.org>
>>>>                             <mailto:users@lists.open-mpi.org
>>>>             <mailto:users@lists.open-mpi.org>>>
>>>>             https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>             <https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> >
>>>>                                        <https://rfd.newmexicoconsort
>>>> ium.org/mailman/listinfo/users
>>>>             <https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> >>
>>>>                                                           <
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>             <https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> >
>>>>                                        <https://rfd.newmexicoconsort
>>>> ium.org/mailman/listinfo/users
>>>>             <https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> >>>
>>>>
>>>>
>>>>
>>>>
>>>>                             --                 Ing. Gabriele Fatigati
>>>>
>>>>                             HPC specialist
>>>>
>>>>                             SuperComputing Applications and Innovation
>>>>             Department
>>>>
>>>>                             Via Magnanelli 6/3, Casalecchio di Reno
>>>>             (BO) Italy
>>>>
>>>>             www.cineca.it <http://www.cineca.it> <http://www.cineca.it>
>>>>                             <http://www.cineca.it> Tel: +39 051
>>>>             6171722 <tel:%2B39%20051%206171722 <%2B39%20051%206171722>>
>>>>             <tel:%2B39%20051%206171722 <%2B39%20051%206171722>>
>>>>
>>>>                             g.fatigati [AT] cineca.it
>>>>             <http://cineca.it> <http://cineca.it>
>>>>                             <http://cineca.it>
>>>>
>>>>
>>>>                                        _____________________________
>>>> __________________
>>>>                             users mailing list
>>>>             users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>             <mailto:users@lists.open-mpi.org
>>>>             <mailto:users@lists.open-mpi.org>>
>>>>             https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>             <https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> >
>>>>                                        <https://rfd.newmexicoconsort
>>>> ium.org/mailman/listinfo/users
>>>>             <https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> >>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                     --         Ing. Gabriele Fatigati
>>>>
>>>>                     HPC specialist
>>>>
>>>>                     SuperComputing Applications and Innovation
>>>> Department
>>>>
>>>>                     Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>>>
>>>>             www.cineca.it <http://www.cineca.it>
>>>>             <http://www.cineca.it>                  Tel:
>>>>             +39 051 6171722 <tel:%2B39%20051%206171722
>>>> <%2B39%20051%206171722>>
>>>>             <tel:+39%20051%20617%201722 <+39%20051%20617%201722>>
>>>>
>>>>
>>>>                     g.fatigati [AT] cineca.it <http://cineca.it>
>>>>             <http://cineca.it>
>>>>
>>>>                     _______________________________________________
>>>>                     users mailing list
>>>>             users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>             <mailto:users@lists.open-mpi.org
>>>>             <mailto:users@lists.open-mpi.org>>
>>>>             https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>             <https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> >
>>>>                                <https://rfd.newmexicoconsort
>>>> ium.org/mailman/listinfo/users
>>>>             <https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> >>
>>>>
>>>>
>>>>
>>>>                 _______________________________________________
>>>>                 users mailing list
>>>>             users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>             <mailto:users@lists.open-mpi.org
>>>>             <mailto:users@lists.open-mpi.org>>
>>>>             https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>             <https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> >
>>>>                            <https://rfd.newmexicoconsort
>>>> ium.org/mailman/listinfo/users
>>>>             <https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> >>
>>>>
>>>>
>>>>
>>>>
>>>>             --             Ing. Gabriele Fatigati
>>>>
>>>>             HPC specialist
>>>>
>>>>             SuperComputing Applications and Innovation Department
>>>>
>>>>             Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>>>
>>>>             www.cineca.it <http://www.cineca.it>
>>>>             <http://www.cineca.it> Tel: +39 051 6171722
>>>>             <tel:%2B39%20051%206171722 <%2B39%20051%206171722>>
>>>>
>>>>             g.fatigati [AT] cineca.it <http://cineca.it>
>>>>             <http://cineca.it>
>>>>
>>>>
>>>>             _______________________________________________
>>>>             users mailing list
>>>>             users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>             https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>             <https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> >
>>>>
>>>>
>>>>         _______________________________________________
>>>>         users mailing list
>>>>         users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>         https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>         <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>>
>>>>
>>>>
>>>>
>>>>     --     Ing. Gabriele Fatigati
>>>>
>>>>     HPC specialist
>>>>
>>>>     SuperComputing Applications and Innovation Department
>>>>
>>>>     Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>>>
>>>>     www.cineca.it <http://www.cineca.it>       Tel: +39 051 6171722
>>>>     <tel:051%20617%201722 <051%20617%201722>>
>>>>
>>>>     g.fatigati [AT] cineca.it <http://cineca.it>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Ing. Gabriele Fatigati
>>>>
>>>> HPC specialist
>>>>
>>>> SuperComputing Applications and Innovation Department
>>>>
>>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>>>
>>>> www.cineca.it <http://www.cineca.it> Tel:   +39 051 6171722
>>>>
>>>> g.fatigati [AT] cineca.it <http://cineca.it>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>
>>
>>
>> --
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it                    Tel:   +39 051 6171722
>> <+39%20051%20617%201722>
>>
>> g.fatigati [AT] cineca.it
>> <output_mpirun>_______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to