Add —mca btl self,vader

-Nathan

> On May 19, 2017, at 1:23 AM, Gabriele Fatigati <g.fatig...@cineca.it> wrote:
> 
> Oh no, by using two procs:
> 
> 
> findActiveDevices Error
> We found no active IB device ports
> findActiveDevices Error
> We found no active IB device ports
> --------------------------------------------------------------------------
> At least one pair of MPI processes are unable to reach each other for
> MPI communications.  This means that no Open MPI device has indicated
> that it can be used to communicate between these processes.  This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other.  This error can sometimes be the result of forgetting to
> specify the "self" BTL.
> 
>   Process 1 ([[12380,1],0]) is on host: openpower
>   Process 2 ([[12380,1],1]) is on host: openpower
>   BTLs attempted: self
> 
> Your MPI job is now going to abort; sorry.
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> --------------------------------------------------------------------------
> MPI_INIT has failed because at least one MPI process is unreachable
> from another.  This *usually* means that an underlying communication
> plugin -- such as a BTL or an MTL -- has either not loaded or not
> allowed itself to be used.  Your MPI job will now abort.
> 
> You may wish to try to narrow down the problem;
>  * Check the output of ompi_info to see which BTL/MTL plugins are
>    available.
>  * Run your application with MPI_THREAD_SINGLE.
>  * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
>    if using MTL-based communications) to see exactly which
>    communication plugins were considered and/or discarded.
> --------------------------------------------------------------------------
> [openpower:88867] 1 more process has sent help message help-mca-bml-r2.txt / 
> unreachable proc
> [openpower:88867] Set MCA parameter "orte_base_help_aggregate" to 0 to see 
> all help / error messages
> [openpower:88867] 1 more process has sent help message help-mpi-runtime.txt / 
> mpi_init:startup:pml-add-procs-fail
> 
> 
> 
> 
> 
> 2017-05-19 9:22 GMT+02:00 Gabriele Fatigati <g.fatig...@cineca.it>:
> Hi GIlles,
> 
> using your command with one MPI procs I get:
> 
> findActiveDevices Error
> We found no active IB device ports
> Hello world from rank 0  out of 1 processors
> 
> So it seems to work apart the error message.
> 
> 
> 2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet <gil...@rist.or.jp>:
> Gabriele,
> 
> 
> so it seems pml/pami assumes there is an infiniband card available (!)
> 
> i guess IBM folks will comment on that shortly.
> 
> 
> meanwhile, you do not need pami since you are running on a single node
> 
> mpirun --mca pml ^pami ...
> 
> should do the trick
> 
> (if it does not work, can run and post the logs)
> 
> mpirun --mca pml ^pami --mca pml_base_verbose 100 ...
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> 
> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:
> Hi John,
> Infiniband is not used, there is a single node on this machine.
> 
> 2017-05-19 8:50 GMT+02:00 John Hearns via users <users@lists.open-mpi.org 
> <mailto:users@lists.open-mpi.org>>:
> 
>     Gabriele,   pleae run  'ibv_devinfo'
>     It looks to me like you may have the physical interface cards in
>     these systems, but you do not have the correct drivers or
>     libraries loaded.
> 
>     I have had similar messages when using Infiniband on x86 systems -
>     which did not have libibverbs installed.
> 
> 
>     On 19 May 2017 at 08:41, Gabriele Fatigati <g.fatig...@cineca.it
>     <mailto:g.fatig...@cineca.it>> wrote:
> 
>         Hi Gilles, using your command:
> 
>         [openpower:88536] mca: base: components_register: registering
>         framework pml components
>         [openpower:88536] mca: base: components_register: found loaded
>         component pami
>         [openpower:88536] mca: base: components_register: component
>         pami register function successful
>         [openpower:88536] mca: base: components_open: opening pml
>         components
>         [openpower:88536] mca: base: components_open: found loaded
>         component pami
>         [openpower:88536] mca: base: components_open: component pami
>         open function successful
>         [openpower:88536] select: initializing pml component pami
>         findActiveDevices Error
>         We found no active IB device ports
>         [openpower:88536] select: init returned failure for component pami
>         [openpower:88536] PML pami cannot be selected
>         
> --------------------------------------------------------------------------
>         No components were able to be opened in the pml framework.
> 
>         This typically means that either no components of this type were
>         installed, or none of the installed componnets can be loaded.
>         Sometimes this means that shared libraries required by these
>         components are unable to be found/loaded.
> 
>           Host:      openpower
>           Framework: pml
>         
> --------------------------------------------------------------------------
> 
> 
>         2017-05-19 7:03 GMT+02:00 Gilles Gouaillardet
>         <gil...@rist.or.jp <mailto:gil...@rist.or.jp>>:
> 
>             Gabriele,
> 
> 
>             pml/pami is here, at least according to ompi_info
> 
> 
>             can you update your mpirun command like this
> 
>             mpirun --mca pml_base_verbose 100 ..
> 
> 
>             and post the output ?
> 
> 
>             Cheers,
> 
>             Gilles
> 
>             On 5/18/2017 10:41 PM, Gabriele Fatigati wrote:
> 
>                 Hi Gilles, attached the requested info
> 
>                 2017-05-18 15:04 GMT+02:00 Gilles Gouaillardet
>                 <gilles.gouaillar...@gmail.com
>                 <mailto:gilles.gouaillar...@gmail.com>
>                 <mailto:gilles.gouaillar...@gmail.com
>                 <mailto:gilles.gouaillar...@gmail.com>>>:
> 
>                     Gabriele,
> 
>                     can you
>                     ompi_info --all | grep pml
> 
>                     also, make sure there is nothing in your
>                 environment pointing to
>                     an other Open MPI install
>                     for example
>                     ldd a.out
>                     should only point to IBM libraries
> 
>                     Cheers,
> 
>                     Gilles
> 
> 
>                     On Thursday, May 18, 2017, Gabriele Fatigati
>                 <g.fatig...@cineca.it <mailto:g.fatig...@cineca.it>
>                     <mailto:g.fatig...@cineca.it
> 
>                 <mailto:g.fatig...@cineca.it>>> wrote:
> 
>                         Dear OpenMPI users and developers, I'm using
>                 IBM Spectrum MPI
>                         10.1.0 based on OpenMPI, so I hope there are
>                 some MPI expert
>                         can help me to solve the problem.
> 
>                         When I run a simple Hello World MPI program, I
>                 get the follow
>                         error message:
> 
> 
>                         A requested component was not found, or was
>                 unable to be
>                         opened.  This
>                         means that this component is either not
>                 installed or is unable
>                         to be
>                         used on your system (e.g., sometimes this
>                 means that shared
>                         libraries
>                         that the component requires are unable to be
>                 found/loaded).         Note that
>                         Open MPI stopped checking at the first
>                 component that it did
>                         not find.
> 
>                         Host:      openpower
>                         Framework: pml
>                         Component: pami
>                 
> --------------------------------------------------------------------------
>                 
> --------------------------------------------------------------------------
>                         It looks like MPI_INIT failed for some reason;
>                 your parallel
>                         process is
>                         likely to abort. There are many reasons that a
>                 parallel
>                         process can
>                         fail during MPI_INIT; some of which are due to
>                 configuration
>                         or environment
>                         problems.  This failure appears to be an
>                 internal failure;
>                         here's some
>                         additional information (which may only be
>                 relevant to an Open MPI
>                         developer):
> 
>                         mca_pml_base_open() failed
>                           --> Returned "Not found" (-13) instead of
>                 "Success" (0)
>                 
> --------------------------------------------------------------------------
>                         *** An error occurred in MPI_Init
>                         *** on a NULL communicator
>                         *** MPI_ERRORS_ARE_FATAL (processes in this
>                 communicator will
>                         now abort,
>                         ***    and potentially your MPI job)
> 
>                         My sysadmin used official IBM Spectrum
>                 packages to install
>                         MPI, so It's quite strange that there are some
>                 components
>                         missing (pami). Any help? Thanks
> 
> 
>                         --         Ing. Gabriele Fatigati
> 
>                         HPC specialist
> 
>                         SuperComputing Applications and Innovation
>                 Department
> 
>                         Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> 
>                 www.cineca.it <http://www.cineca.it>
>                 <http://www.cineca.it>              Tel: +39
>                 051 6171722 <tel:051%206171722> <tel:051%20617%201722>
> 
>                         g.fatigati [AT] cineca.it <http://cineca.it>
>                 <http://cineca.it>
> 
> 
>                     _______________________________________________
>                     users mailing list
>                 users@lists.open-mpi.org
>                 <mailto:users@lists.open-mpi.org>
>                 <mailto:users@lists.open-mpi.org
>                 <mailto:users@lists.open-mpi.org>>
>                 https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>                 <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>                                    
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>                 <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>>
> 
> 
> 
> 
>                 --                 Ing. Gabriele Fatigati
> 
>                 HPC specialist
> 
>                 SuperComputing Applications and Innovation Department
> 
>                 Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> 
>                 www.cineca.it <http://www.cineca.it>
>                 <http://www.cineca.it> Tel: +39 051 6171722
>                 <tel:%2B39%20051%206171722>
> 
>                 g.fatigati [AT] cineca.it <http://cineca.it>
>                 <http://cineca.it>
> 
> 
>                 _______________________________________________
>                 users mailing list
>                 users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>                 https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>                 <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> 
> 
> 
> 
> 
>         --         Ing. Gabriele Fatigati
> 
>         HPC specialist
> 
>         SuperComputing Applications and Innovation Department
> 
>         Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> 
>         www.cineca.it <http://www.cineca.it>                   Tel:
>         +39 051 6171722 <tel:+39%20051%20617%201722>
> 
> 
>         g.fatigati [AT] cineca.it <http://cineca.it>
> 
>         _______________________________________________
>         users mailing list
>         users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>         https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>         <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> 
> 
> 
>     _______________________________________________
>     users mailing list
>     users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>     <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> 
> 
> 
> 
> -- 
> Ing. Gabriele Fatigati
> 
> HPC specialist
> 
> SuperComputing Applications and Innovation Department
> 
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> 
> www.cineca.it <http://www.cineca.it> Tel:   +39 051 6171722
> 
> g.fatigati [AT] cineca.it <http://cineca.it>
> 
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> 
> 
> -- 
> Ing. Gabriele Fatigati
> 
> HPC specialist
> 
> SuperComputing Applications and Innovation Department
> 
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> 
> www.cineca.it                    Tel:   +39 051 6171722
> 
> g.fatigati [AT] cineca.it           
> 
> 
> 
> -- 
> Ing. Gabriele Fatigati
> 
> HPC specialist
> 
> SuperComputing Applications and Innovation Department
> 
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> 
> www.cineca.it                    Tel:   +39 051 6171722
> 
> g.fatigati [AT] cineca.it           
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to