Thank you very much. I tried
mpirun -np 6 -machinefile ./myh -mca pml cm ./b_eff and to amuse you mpirun -np 6 -machinefile ./myh -mca btl mx,sm,self ./b_eff with myh containing two host names and both commands went swimmingly. To make absolutely sure, I checked the usage of the myrinet ports and on each system 3 myrinet ports were open. Lydia On Mon, 20 Nov 2006 users-requ...@open-mpi.org wrote: > > ------------------------------ > > Message: 2 > Date: Mon, 20 Nov 2006 20:05:22 +0000 (GMT) > From: Lydia Heck <lydia.h...@durham.ac.uk> > Subject: [OMPI users] myrinet mx and openmpi using solaris, sun > compilers > To: us...@open-mpi.org > Message-ID: > <pine.gso.4.53.0611201939260.3...@duss0-ast.phyast.dur.ac.uk> > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > I have built the myrinet drivers with gcc or the studio 11 compilers from sun. > The following problem appears for both installations. > > I have tested the myrinet installations using myricoms own test programs. > > Then I build open-mpi using the studio11 compilers enabling myrinet. > > All the library paths are correctly set and I can run my test program > which is written in C, successfully, if I choose the number of CPUs to be > equal > the number of nodes, which means one instance of process per node! > > Each node has 4 CPUs. > > If I now request the number of CPUs for the run to be larger than the > number of nodes I get an error message, which clearly indicates > that the openmpi cannot communicate over more than one channel > on the myrinet card. However I should be able to communicate over > 4 channels at least - colleagues of mine are doing that using > mpich and the same type of myrinet card. > > Any idead why this should happen? > > the hostfile looks like: > > m2009 slots=4 > m2010 slots=4 > > > but it will provide the same error if the hosts file is > > m2009 > m2010 > > ompi_info | grep mx > 2001(128) > ompi_info | grep mx > MCA btl: mx (MCA v1.0, API v1.0.1, Component v1.2) > MCA mtl: mx (MCA v1.0, API v1.0, Component v1.2) > m2009(160) > /opt/mx/bin/mx_endpoint_info > 1 Myrinet board installed. > The MX driver is configured to support up to 4 endpoints on 4 boards. > =================================================================== > Board #0: > Endpoint PID Command Info > <raw> 15039 > 0 15544 > There are currently 1 regular endpoint open > > > > > m2001(120) > mpirun -np 6 -hostfile hostsfile -mca btl mx,self b_eff > -------------------------------------------------------------------------- > Process 0.1.0 is unable to reach 0.1.0 for MPI communication. > If you specified the use of a BTL component, you may have > forgotten a component (such as "self") in the list of > usable components. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > Process 0.1.2 is unable to reach 0.1.0 for MPI communication. > If you specified the use of a BTL component, you may have > forgotten a component (such as "self") in the list of > usable components. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > Process 0.1.4 is unable to reach 0.1.4 for MPI communication. > If you specified the use of a BTL component, you may have > forgotten a component (such as "self") in the list of > usable components. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > If you specified the use of a BTL component, you may have > forgotten a component (such as "self") in the list of > usable components. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > Process 0.1.5 is unable to reach 0.1.4 for MPI communication. > If you specified the use of a BTL component, you may have > forgotten a component (such as "self") in the list of > usable components. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > Process 0.1.3 is unable to reach 0.1.0 for MPI communication. > If you specified the use of a BTL component, you may have > forgotten a component (such as "self") in the list of > usable components. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > PML add procs failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > -------------------------------------------------------------------------- > PML add procs failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > PML add procs failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > -------------------------------------------------------------------------- > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (goodbye) > *** An error occurred in MPI_Init > *** before MPI was initialized > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > PML add procs failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > -------------------------------------------------------------------------- > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (goodbye) > *** MPI_ERRORS_ARE_FATAL (goodbye) > m2001(121) > mpirun -np 4 -hostfile hostsfile -mca btl mx b_eff > -------------------------------------------------------------------------- > Process 0.1.0 is unable to reach 0.1.0 for MPI communication. > If you specified the use of a BTL component, you may have > forgotten a component (such as "self") in the list of > usable components. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > If you specified the use of a BTL component, you may have > forgotten a component (such as "self") in the list of > usable components. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > Process 0.1.2 is unable to reach 0.1.0 for MPI communication. > If you specified the use of a BTL component, you may have > forgotten a component (such as "self") in the list of > usable components. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > Process 0.1.3 is unable to reach 0.1.0 for MPI communication. > If you specified the use of a BTL component, you may have > forgotten a component (such as "self") in the list of > usable components. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > PML add procs failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > PML add procs failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > -------------------------------------------------------------------------- > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (goodbye) > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (goodbye) > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > PML add procs failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > -------------------------------------------------------------------------- > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (goodbye) > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > PML add procs failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > -------------------------------------------------------------------------- > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (goodbye) > > > ------------------------------------------ > Dr E L Heck > > University of Durham > Institute for Computational Cosmology > Ogden Centre > Department of Physics > South Road > > DURHAM, DH1 3LE > United Kingdom > > e-mail: lydia.h...@durham.ac.uk > > Tel.: + 44 191 - 334 3628 > Fax.: + 44 191 - 334 3645 > ___________________________________________ > > > ------------------------------ > > Message: 3 > Date: Mon, 20 Nov 2006 13:25:55 -0700 > From: "Galen M. Shipman" <gship...@lanl.gov> > Subject: Re: [OMPI users] myrinet mx and openmpi using solaris, sun > compilers > To: Open MPI Users <us...@open-mpi.org> > Message-ID: <45620f53....@lanl.gov> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > >m2001(120) > mpirun -np 6 -hostfile hostsfile -mca btl mx,self b_eff > > > > > > This does appear to be a bug, although you are using the MX BTL. Our > higher performance path is the MX MTL. To use this try: > > mpirun -np 6 -hostfile hostsfile -mca pml cm b_eff > > Also, just for grins, could you try: > > mpirun -np 6 -hostfile hostsfile -mca btl mx,sm,self b_eff > > This will use the BTL interface but provides Shared Memory between > processes on the same node. > > Thanks, > > Galen > > >-------------------------------------------------------------------------- > >Process 0.1.0 is unable to reach 0.1.0 for MPI communication. > >If you specified the use of a BTL component, you may have > >forgotten a component (such as "self") in the list of > >usable components. > >-------------------------------------------------------------------------- > >-------------------------------------------------------------------------- > >Process 0.1.2 is unable to reach 0.1.0 for MPI communication. > >If you specified the use of a BTL component, you may have > >forgotten a component (such as "self") in the list of > >usable components. > >-------------------------------------------------------------------------- > >-------------------------------------------------------------------------- > >Process 0.1.4 is unable to reach 0.1.4 for MPI communication. > >If you specified the use of a BTL component, you may have > >forgotten a component (such as "self") in the list of > >usable components. > >-------------------------------------------------------------------------- > >-------------------------------------------------------------------------- > >Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > >If you specified the use of a BTL component, you may have > >forgotten a component (such as "self") in the list of > >usable components. > >-------------------------------------------------------------------------- > >-------------------------------------------------------------------------- > >Process 0.1.5 is unable to reach 0.1.4 for MPI communication. > >If you specified the use of a BTL component, you may have > >forgotten a component (such as "self") in the list of > >usable components. > >-------------------------------------------------------------------------- > >-------------------------------------------------------------------------- > >Process 0.1.3 is unable to reach 0.1.0 for MPI communication. > >If you specified the use of a BTL component, you may have > >forgotten a component (such as "self") in the list of > >usable components. > >-------------------------------------------------------------------------- > >-------------------------------------------------------------------------- > >It looks like MPI_INIT failed for some reason; your parallel process is > >likely to abort. There are many reasons that a parallel process can > >fail during MPI_INIT; some of which are due to configuration or environment > >problems. This failure appears to be an internal failure; here's some > >additional information (which may only be relevant to an Open MPI > >developer): > > > > -------------------------------------------------------------------------- > >It looks like MPI_INIT failed for some reason; your parallel process is > >likely to abort. There are many reasons that a parallel process can > >fail during MPI_INIT; some of which are due to configuration or environment > >problems. This failure appears to be an internal failure; here's some > >additional information (which may only be relevant to an Open MPI > >developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > >-------------------------------------------------------------------------- > >PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > >-------------------------------------------------------------------------- > >-------------------------------------------------------------------------- > >It looks like MPI_INIT failed for some reason; your parallel process is > >likely to abort. There are many reasons that a parallel process can > >fail during MPI_INIT; some of which are due to configuration or environment > >problems. This failure appears to be an internal failure; here's some > >additional information (which may only be relevant to an Open MPI > >developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > >-------------------------------------------------------------------------- > >*** An error occurred in MPI_Init > >*** before MPI was initialized > >*** MPI_ERRORS_ARE_FATAL (goodbye) > >*** An error occurred in MPI_Init > >*** before MPI was initialized > >-------------------------------------------------------------------------- > >It looks like MPI_INIT failed for some reason; your parallel process is > >likely to abort. There are many reasons that a parallel process can > >fail during MPI_INIT; some of which are due to configuration or environment > >problems. This failure appears to be an internal failure; here's some > >additional information (which may only be relevant to an Open MPI > >developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > >-------------------------------------------------------------------------- > >*** An error occurred in MPI_Init > >*** before MPI was initialized > >*** MPI_ERRORS_ARE_FATAL (goodbye) > >*** MPI_ERRORS_ARE_FATAL (goodbye) > >m2001(121) > mpirun -np 4 -hostfile hostsfile -mca btl mx b_eff > >-------------------------------------------------------------------------- > >Process 0.1.0 is unable to reach 0.1.0 for MPI communication. > >If you specified the use of a BTL component, you may have > >forgotten a component (such as "self") in the list of > >usable components. > >-------------------------------------------------------------------------- > >-------------------------------------------------------------------------- > >Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > >If you specified the use of a BTL component, you may have > >forgotten a component (such as "self") in the list of > >usable components. > >-------------------------------------------------------------------------- > >-------------------------------------------------------------------------- > >Process 0.1.2 is unable to reach 0.1.0 for MPI communication. > >If you specified the use of a BTL component, you may have > >forgotten a component (such as "self") in the list of > >usable components. > >-------------------------------------------------------------------------- > >-------------------------------------------------------------------------- > >Process 0.1.3 is unable to reach 0.1.0 for MPI communication. > >If you specified the use of a BTL component, you may have > >forgotten a component (such as "self") in the list of > >usable components. > >-------------------------------------------------------------------------- > >-------------------------------------------------------------------------- > >It looks like MPI_INIT failed for some reason; your parallel process is > >likely to abort. There are many reasons that a parallel process can > >fail during MPI_INIT; some of which are due to configuration or environment > >problems. This failure appears to be an internal failure; here's some > >additional information (which may only be relevant to an Open MPI > >developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > >-------------------------------------------------------------------------- > >-------------------------------------------------------------------------- > >It looks like MPI_INIT failed for some reason; your parallel process is > >likely to abort. There are many reasons that a parallel process can > >fail during MPI_INIT; some of which are due to configuration or environment > >problems. This failure appears to be an internal failure; here's some > >additional information (which may only be relevant to an Open MPI > >developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > >-------------------------------------------------------------------------- > >*** An error occurred in MPI_Init > >*** before MPI was initialized > >*** MPI_ERRORS_ARE_FATAL (goodbye) > >*** An error occurred in MPI_Init > >*** before MPI was initialized > >*** MPI_ERRORS_ARE_FATAL (goodbye) > >-------------------------------------------------------------------------- > >It looks like MPI_INIT failed for some reason; your parallel process is > >likely to abort. There are many reasons that a parallel process can > >fail during MPI_INIT; some of which are due to configuration or environment > >problems. This failure appears to be an internal failure; here's some > >additional information (which may only be relevant to an Open MPI > >developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > >-------------------------------------------------------------------------- > >*** An error occurred in MPI_Init > >*** before MPI was initialized > >*** MPI_ERRORS_ARE_FATAL (goodbye) > >-------------------------------------------------------------------------- > >It looks like MPI_INIT failed for some reason; your parallel process is > >likely to abort. There are many reasons that a parallel process can > >fail during MPI_INIT; some of which are due to configuration or environment > >problems. This failure appears to be an internal failure; here's some > >additional information (which may only be relevant to an Open MPI > >developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > >-------------------------------------------------------------------------- > >*** An error occurred in MPI_Init > >*** before MPI was initialized > >*** MPI_ERRORS_ARE_FATAL (goodbye) > > > > > >------------------------------------------ > >Dr E L Heck > > > >University of Durham > >Institute for Computational Cosmology > >Ogden Centre > >Department of Physics > >South Road > > > >DURHAM, DH1 3LE > >United Kingdom > > > >e-mail: lydia.h...@durham.ac.uk > > > >Tel.: + 44 191 - 334 3628 > >Fax.: + 44 191 - 334 3645 > >___________________________________________ > >_______________________________________________ > >users mailing list > >us...@open-mpi.org > >http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > ------------------------------ > > Message: 4 > Date: Mon, 20 Nov 2006 17:35:35 -0700 > From: "Maestas, Christopher Daniel" <cdma...@sandia.gov> > Subject: [OMPI users] Quote on mvapich site > To: mvap...@cse.ohio-state.edu > Cc: Open MPI Users <us...@open-mpi.org> > Message-ID: > <347180497203a942a6aa82c85846cbc9034f6...@es23snlnt.srn.sandia.gov> > Content-Type: text/plain; charset=us-ascii > > I believe the quote regarding thunderbird on the following site is not > correct: > http://nowlab.cse.ohio-state.edu/projects/mpi-iba/ > > We do have mvapich installed on thunderbird, but I believe the quote is > misleading in leading people to believe mvapich was used to obtain our > recent top500 number. However, this is not the case and is documented > here: > > http://www.sandia.gov/news/resources/releases/2006/thunderbird.html > > Who can we get to correct this on the mvapich site? > > Thanks, > -cdm > > > > > > ------------------------------ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > End of users Digest, Vol 437, Issue 2 > ************************************* > ------------------------------------------ Dr E L Heck University of Durham Institute for Computational Cosmology Ogden Centre Department of Physics South Road DURHAM, DH1 3LE United Kingdom e-mail: lydia.h...@durham.ac.uk Tel.: + 44 191 - 334 3628 Fax.: + 44 191 - 334 3645 ___________________________________________