Maybe here you found what need:
https://www.open-mpi.org/faq/?category=tuning#setting-mca-params
also  you can try run gdb via mpirun to get debug info:
mpirun -np 2 xterm -e gdb ./your_program


2015-09-28 14:43 GMT+03:00 Sven Schumacher <schumac...@tfd.uni-hannover.de>:

> Hello,
>
> I've set up our new cluster using Infiniband using a combination of:
> Debian, Torque/Maui, BeeGeeFS (formerly FHGFS)
>
> Every node has two infiniband-ports, both of them having an IP-Adress.
> One port shall be used for BeeGeeFS (which is working well) and the
> other one for MPI-Communication.
>
> I'm using openmpi in version 1.8.5, compiled with gcc/gfortran 4.9.2 and
> ibverbs support.
> Configure command was the following:
>
> Output of "ompi_info --parsable  -a -c" is attached as txt-file (all
> nodes are configured the same)
>
>
> The following infiniband-related kernel-modules are loaded:
> > mlx4_core             206165  1 mlx4_ib
> > rdma_ucm               22055  0
> > ib_uverbs              44693  1 rdma_ucm
> > rdma_cm                39518  2 ib_iser,rdma_ucm
> > iw_cm                  31011  1 rdma_cm
> > ib_umad                17311  0
> > mlx4_ib               136293  0
> > ib_cm                  39055  3 rdma_cm,ib_srp,ib_ipoib
> > ib_sa                  26986  6
> > rdma_cm,ib_cm,mlx4_ib,ib_srp,rdma_ucm,ib_ipoib
> > ib_mad                 39969  4 ib_cm,ib_sa,mlx4_ib,ib_umad
> > ib_core                68904  12
> >
> rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_ib,ib_mad,ib_srp,ib_iser,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib
> > ib_addr                17148  3 rdma_cm,ib_core,rdma_ucm
> > ib_iser                44204  0
> > iscsi_tcp              17580  0
> > libiscsi_tcp           21554  1 iscsi_tcp
> > libiscsi               48004  3 libiscsi_tcp,iscsi_tcp,ib_iser
> > scsi_transport_iscsi    77478  4 iscsi_tcp,ib_iser,libiscsi
> > ib_ipoib               85167  0
> > ib_srp                 39710  0
> > scsi_transport_srp     18194  1 ib_srp
> > scsi_tgt               17698  1 scsi_transport_srp
>
> When using mpiexec to execute a job running on a single node using 8
> cores everything works fine, but when mpiexec has to start a second
> process on another node it doesn't start that process.
> What I already did:
>
> Testing ssh-logins: Works (without a password using ssh-keys).
> Testing name-resolution: works
>
> Used a "hello Word"-mpi-Program:
> > #include <mpi.h>
> > #include <stdio.h>
> >
> > int main(int argc, char** argv) {
> >     // Initialize the MPI environment
> >     MPI_Init(NULL, NULL);
> >
> >     // Get the number of processes
> >     int world_size;
> >     MPI_Comm_size(MPI_COMM_WORLD, &world_size);
> >
> >     // Get the rank of the process
> >     int world_rank;
> >     MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
> >
> >     // Get the name of the processor
> >     char processor_name[MPI_MAX_PROCESSOR_NAME];
> >     int name_len;
> >     MPI_Get_processor_name(processor_name, &name_len);
> >
> >     // Print off a hello world message
> >     printf("Hello world from processor %s, rank %d"
> >            " out of %d processors\n",
> >            processor_name, world_rank, world_size);
> >
> >     // Finalize the MPI environment.
> >     MPI_Finalize();
> > }
>
>
> This throws an error (on a single node it produces the following error
> messages, but doesn't produce any output , when run on two nodes):
> > [hydra001:20324] 1 more process has sent help message
> > help-mpi-btl-openib-cpc-base.txt / no cpcs for port
> > [hydra001:20324] Set MCA parameter "orte_base_help_aggregate" to 0 to
> > see all help / error messages
>
> >
> --------------------------------------------------------------------------
> > No OpenFabrics connection schemes reported that they were able to be
> > used on a specific port.  As such, the openib BTL (OpenFabrics
> > support) will be disabled for this port.
> >
> >   Local host:           hydra001
> >   Local device:         mlx4_0
> >   Local port:           1
> >   CPCs attempted:       udcm
> >
> --------------------------------------------------------------------------
> > Hello world from processor hydra001, rank 0 out of 1 processors
>
> So, where can I find a documented list of all these MCA parameters? It
> doesn't seem there is such a list on open-mpi.org or I didn't find it...
> so thanks in advance for directing me to right place
>
> Sven Schumacher
>
>
>
>
>
>
> --
> Sven Schumacher - Systemadministrator Tel: (0511)762-2753
> Leibniz Universitaet Hannover
> Institut für Turbomaschinen und Fluid-Dynamik       - TFD
> Appelstraße 9 - 30167 Hannover
> Institut für Kraftwerkstechnik und Wärmeübertragung - IKW
> Callinstraße 36 - 30167 Hannover
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27695.php
>



-- 
-- 
С уважением.

Reply via email to