Re: [OMPI users] Open-MPI-1.3.2 compatibility with old torque?

Song, Kai Song Thu, 23 Jul 2009 18:46:49 -0400

Hi Ralph,

With the flag -mca btl gm,sm,self, runnig the job maually works and has a 
better performance as you said!


However, it still gangs there when it goes through the PBS scheduler. 

Here is my PBS script:
#!/bin/sh
#PBS -l nodes=2:ppn=2
#PBS -l walltime=00:02:00
#PBS -k eo

cd ~kaisong/test
echo `pwd`
cat $PBS_NODEFILE
echo "Starting parallel job"
/home/software/ompi/1.3.2-pgi/bin/mpirun -mca btl gm,self --display-allocation 
--display-map -d 8 -v -machinefile $PBS_NODEFILE -np 4 ./hello-hostname
echo "ending parallel job"

The error message and ouput file from torque are same as before. What other 
problems do you think it could be...? Please let me know if you need more 
information about our system.

Thanks a lot for helping me along this far! I hope we are getting close to find 
out the real problem.

Kai
--------------------
Kai Song
<ks...@lbl.gov> 1.510.486.4894
High Performance Computing Services (HPCS) Intern
Lawrence Berkeley National Laboratory - http://scs.lbl.gov


----- Original Message -----
From: Ralph Castain <r...@open-mpi.org>
List-Post: users@lists.open-mpi.org
Date: Thursday, July 23, 2009 1:06 pm
Subject: Re: [OMPI users] Open-MPI-1.3.2 compatibility with old torque?
To: "Song, Kai Song" <ks...@lbl.gov>
Cc: Open MPI Users <us...@open-mpi.org>

> My apologies - I had missed that -mca btl flag. That is the source 
> of  
> the trouble. IIRC, GM doesn't have a loopback method in it. OMPI  
> requires that -every- proc be able to reach -every- proc, including 
> 
> itself.
> 
> So you must include the "self" btl at a minimum. Also, if you want  
> more performance, you probably want to include the shared memory 
> BTL  
> as well.
> 
> So the recommended param would be:
> 
> -mca btl gm,sm,self
> 
> Order doesn't matter. I'm disturbed that it would hang when you run 
> in  
> batch, though, instead of abort. Try with this new flag and see if 
> it  
> runs in both batch and interactive mode.
> 
> HTH
> Ralph
> 
> On Jul 23, 2009, at 1:10 PM, Song, Kai Song wrote:
> 
> > Hi Ralph,
> >
> > Thanks for the fast reply! I put the --display-allocation and -- 
> > display-map flags on and it looks like the nodes allocation is 
> just  
> > fine, but the job still hang.
> >
> > The output looks like this:
> > /home/kaisong/test
> > node0001
> > node0001
> > node0000
> > node0000
> > Starting parallel job
> >
> > ======================   ALLOCATED NODES   ======================
> >
> > Data for node: Name: node0001       Num slots: 2    Max slots: 0
> > Data for node: Name: node0000       Num slots: 2    Max slots: 0
> >
> > =================================================================
> >
> > ========================   JOB MAP   ========================
> >
> > Data for node: Name: node0001       Num procs: 2
> >     Process OMPI jobid: [16591,1] Process rank: 0
> >     Process OMPI jobid: [16591,1] Process rank: 1
> >
> > Data for node: Name: node0000       Num procs: 2
> >     Process OMPI jobid: [16591,1] Process rank: 2
> >     Process OMPI jobid: [16591,1] Process rank: 3
> >
> > =============================================================
> > (no hello wrold output, job just hang here until timeout).
> > And similar thing in the error output:
> > node0000 - daemon did not report back when launched
> >
> >
> > Then, I ran the job manually by adding "-mca btl gm" flag for 
> mpirun:> /home/software/ompi/1.3.2-pgi/bin/mpirun -mca gm --display-
> 
> > allocation --display-map -v -machinefile ./node -np 4 ./hello-
> hostname>
> > MPI crashed with the following output/error:
> > ======================   ALLOCATED NODES   ======================
> >
> >  Data for node: Name: hbar.lbl.gov      Num slots: 0    Max 
> slots: 0
> >  Data for node: Name: node0045  Num slots: 4    Max slots: 0
> >  Data for node: Name: node0046  Num slots: 4    Max slots: 0
> >  Data for node: Name: node0047  Num slots: 4    Max slots: 0
> >  Data for node: Name: node0048  Num slots: 4    Max slots: 0
> >
> > =================================================================
> >
> >  ========================   JOB MAP   ========================
> >
> >  Data for node: Name: node0045  Num procs: 4
> >         Process OMPI jobid: [62741,1] Process rank: 0
> >         Process OMPI jobid: [62741,1] Process rank: 1
> >         Process OMPI jobid: [62741,1] Process rank: 2
> >         Process OMPI jobid: [62741,1] Process rank: 3
> >
> >  =============================================================
> > ------------------------------------------------------------------
> --------
> > At least one pair of MPI processes are unable to reach each other 
> for> MPI communications.  This means that no Open MPI device has 
> indicated> that it can be used to communicate between these 
> processes.  This is
> > an error; Open MPI requires that all MPI processes be able to reach
> > each other.  This error can sometimes be the result of forgetting to
> > specify the "self" BTL.
> >
> >   Process 1 ([[62741,1],1]) is on host: node0045
> >   Process 2 ([[62741,1],1]) is on host: node0045
> >   BTLs attempted: gm
> >
> > Your MPI job is now going to abort; sorry.
> > ------------------------------------------------------------------
> --------
> > ------------------------------------------------------------------
> --------
> > ------------------------------------------------------------------
> --------
> > ------------------------------------------------------------------
> --------
> > It looks like MPI_INIT failed for some reason; your parallel 
> process  
> > is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or
> > environment
> > problems.  This failure appears to be an internal failure; here's 
> some> additional information (which may only be relevant to an Open 
> MPI> developer):
> >
> >   PML add procs failed
> >   --> Returned "Unreachable" (-12) instead of "Success" (0)
> > ------------------------------------------------------------------
> --------
> > *** An error occurred in MPI_Init
> > *** before MPI was initialized
> > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> > [node0045:366] Abort before MPI_INIT completed successfully; not 
> able> to guarantee that all other process
> > !
> > *** An error occurred in MPI_Init
> > *** before MPI was initialized
> > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> > [node0045:367] Abort before MPI_INIT completed successfully; not 
> able> to guarantee that all other process
> > !
> > *** An error occurred in MPI_Init
> > *** before MPI was initialized
> > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> > [node0045:368] Abort before MPI_INIT completed successfully; not 
> able> to guarantee that all other process
> > !
> >
> >
> > *** An error occurred in MPI_Init
> > *** before MPI was initialized
> > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> > [node0045:365] Abort before MPI_INIT completed successfully; not 
> able> to guarantee that all other process
> > !
> > ------------------------------------------------------------------
> --------
> > mpirun has exited due to process rank 3 with PID 368 on
> > node node0045 exiting without calling "finalize". This may
> > have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here).
> > ------------------------------------------------------------------
> --------
> > [hbar.lbl.gov:07770] 3 more processes have sent help message help-
> mca-
> > bml-r2.txt / unreachable proc
> > [hbar.lbl.gov:07770] Set MCA parameter "orte_base_help_aggregate" 
> to 0
> > to see all help / error messages
> > [hbar.lbl.gov:07770] 3 more processes have sent help message help-
> mpi-
> > runtime / mpi_init:startup:internal
> >
> >
> >
> > However, it works if I add "self" to the -mca flag:
> > /home/software/ompi/1.3.2-pgi/bin/mpirun -mca btl gm,tcp,self -- 
> > display-allocation --display-map -v -machinefile ./node -np 16 ./ 
> > hello-hostname
> >
> > ======================   ALLOCATED NODES   ======================
> >
> > Data for node: Name: node0045       Num slots: 4    Max slots: 0
> > Data for node: Name: node0046       Num slots: 4    Max slots: 0
> > Data for node: Name: node0047       Num slots: 4    Max slots: 0
> > Data for node: Name: node0048       Num slots: 4    Max slots: 0
> >
> > =================================================================
> >
> > ========================   JOB MAP   ========================
> >
> > Data for node: Name: node0045       Num procs: 4
> >     Process OMPI jobid: [49981,1] Process rank: 0
> >     Process OMPI jobid: [49981,1] Process rank: 1
> >     Process OMPI jobid: [49981,1] Process rank: 2
> >     Process OMPI jobid: [49981,1] Process rank: 3
> >
> > Data for node: Name: node0046       Num procs: 4
> >     Process OMPI jobid: [49981,1] Process rank: 4
> >     Process OMPI jobid: [49981,1] Process rank: 5
> >     Process OMPI jobid: [49981,1] Process rank: 6
> >     Process OMPI jobid: [49981,1] Process rank: 7
> >
> > Data for node: Name: node0047       Num procs: 4
> >     Process OMPI jobid: [49981,1] Process rank: 8
> >     Process OMPI jobid: [49981,1] Process rank: 9
> >     Process OMPI jobid: [49981,1] Process rank: 10
> >     Process OMPI jobid: [49981,1] Process rank: 11
> >
> > Data for node: Name: node0048       Num procs: 4
> >     Process OMPI jobid: [49981,1] Process rank: 12
> >     Process OMPI jobid: [49981,1] Process rank: 13
> >     Process OMPI jobid: [49981,1] Process rank: 14
> >     Process OMPI jobid: [49981,1] Process rank: 15
> >
> > =============================================================
> > Hello world from process 13 of 16
> > Hostname: node0048
> > Hello world from process 15 of 16
> > Hostname: node0048
> > Hello world from process 12 of 16
> > Hostname: node0048
> > Hello world from process 3 of 16
> > Hostname: node0045
> > Hello world from process 6 of 16
> > Hostname: node0046
> > Hello world from process 8 of 16
> > Hostname: node0047
> > Hello world from process 0 of 16
> > Hostname: node0045
> > Hello world from process 4 of 16
> > Hostname: node0046
> > Hello world from process 2 of 16
> > Hostname: node0045
> > Hello world from process 5 of 16
> > Hostname: node0046
> > Hello world from process 9 of 16
> > Hostname: node0047
> > Hello world from process 10 of 16
> > Hostname: node0047
> > Hello world from process 11 of 16
> > Hostname: node0047
> > Hello world from process 14 of 16
> > Hostname: node0048
> > Hello world from process 1 of 16
> > Hostname: node0045
> > Hello world from process 7 of 16
> > Hostname: node0046
> >
> >
> > So, I suspect it is not the parsing problem of the -machinefile  
> > flag. Somehow the nodes don't communicate with "-mca btl gm" 
> option  
> > on. Do you think it is the compatibility problem with myrinet 
> driver?>
> > Thanks again for you help!
> >
> > Kai
> > --------------------
> > Kai Song
> > <ks...@lbl.gov> 1.510.486.4894
> > High Performance Computing Services (HPCS) Intern
> > Lawrence Berkeley National Laboratory - http://scs.lbl.gov
> >
> >
> > ----- Original Message -----
> > From: Ralph Castain <r...@open-mpi.org>
> > Date: Wednesday, July 22, 2009 5:03 pm
> > Subject: Re: [OMPI users] Open-MPI-1.3.2 compatibility with old  
> > torque?
> > To: Open MPI Users <us...@open-mpi.org>
> > Cc: "Song, Kai Song" <ks...@lbl.gov>
> >
> >> mpirun --display-allocation --display-map
> >>
> >> Run a batch job that just prints out $PBS_NODEFILE. I'll bet that
> >> it
> >> isn't what we are expecting, and that the problem comes from it.
> >>
> >> In a Torque environment, we read that file to get the list of nodes
> >>
> >> and #slots/node that are allocated to your job. We then filter that
> >>
> >> through any hostfile you provide. So all the nodes have to be in
> >> the
> >> $PBS_NODEFILE, which has to be in the expected format.
> >>
> >> I'm a little suspicious, though, because of your reported error. It
> >>
> >> sounds like we are indeed trying to launch a daemon on a known
> >> node. I
> >> can only surmise a couple of possible reasons for the failure:
> >>
> >> 1. this is a node that is not allocated for your use. Was node0006
> >> in
> >> your allocation?? If not, then the launch would fail. This would
> >> indicate we are not parsing the nodefile correctly.
> >>
> >> 2. if the node is in your allocation, then I would wonder if you
> >> have
> >> a TCP connection between that node and the one where mpirun exists.
> >> Is
> >> there a firewall in the way? Or something that would preclude a
> >> connection? Frankly, I doubt this possibility because it works when
> >>
> >> run manually.
> >>
> >> My money is on option #1. :-)
> >>
> >> If it is #1 and you send me a copy of a sample $PBS_NODEFILE on
> >> your
> >> system, I can create a way to parse it so we can provide support
> >> for
> >> that older version.
> >>
> >> Ralph
> >>
> >>
> >> On Jul 21, 2009, at 4:44 PM, Song, Kai Song wrote:
> >>
> >>> Hi Ralph,
> >>>
> >>> Thanks a lot for the fast response.
> >>>
> >>> Could you give me more instructions on which command do I put "-
> -
> >>> display-allocation" and "--display-map" with? mpirun?
> >> ./configure?...>
> >>> Also,we have tested that in our PBS script, if we put node=1, the
> >>
> >>> helloworld works. But, when I put node=2 or more, it will hang
> >> until
> >>> timeout . And the error message will be something like:
> >>> node0006 - daemon did not report back when launched
> >>>
> >>> However, if we don't go through the scheduler and run mpi
> >> manually,
> >>> everything works fine too.
> >>> /home/software/ompi/1.3.2-pgi/bin/mpirun -machinefile ./nodes -np
> >>
> >>> 16 ./a.out
> >>>
> >>> What do you think the problem would be? It's not the network
> >> issue,
> >>> because manually running MPI works. That is why we question about
> >>
> >>> torque compatibility.
> >>>
> >>> Thanks again,
> >>>
> >>> Kai
> >>>
> >>> --------------------
> >>> Kai Song
> >>> <ks...@lbl.gov> 1.510.486.4894
> >>> High Performance Computing Services (HPCS) Intern
> >>> Lawrence Berkeley National Laboratory - http://scs.lbl.gov
> >>>
> >>>
> >>> ----- Original Message -----
> >>> From: Ralph Castain <r...@open-mpi.org>
> >>> Date: Tuesday, July 21, 2009 12:12 pm
> >>> Subject: Re: [OMPI users] Open-MPI-1.3.2 compatibility with old
> >>> torque?
> >>> To: Open MPI Users <us...@open-mpi.org>
> >>>
> >>>> I'm afraid I have no idea - I've never seen a Torque version
> >> that
> >>>> old,
> >>>> however, so it is quite possible that we don't work with it. It
> >>>> also looks
> >>>> like it may have been modified (given the p2-aspen3 on the 
> end), so
> >>>> I have
> >>>> no idea how the system would behave.
> >>>>
> >>>> First thing you could do is verify that the allocation is being
> >> read>> correctly. Add a --display-allocation to the cmd line and
> >> see what
> >>>> we think
> >>>> Torque gave us. Then add --display-map to see where it plans to
> >>>> place the
> >>>> processes.
> >>>>
> >>>> If all that looks okay, and if you allow ssh, then try -mca 
> plm rsh
> >>>> on the
> >>>> cmd line and see if that works.
> >>>>
> >>>> HTH
> >>>> Ralph
> >>>>
> >>>>
> >>>> On Tue, Jul 21, 2009 at 12:57 PM, Song, Kai Song <ks...@lbl.gov>
> >>>> wrote:
> >>>>> Hi All,
> >>>>>
> >>>>> I am building open-mpi-1.3.2 on centos-3.4, with torque-
> 1.1.0p2-
> >>>> aspen3 and
> >>>>> myrinet. I compiled it just fine with this configuration:
> >>>>> ./configure --prefix=/home/software/ompi/1.3.2-pgi --with-
> >>>> gm=/usr/local/> --with-gm-libdir=/usr/local/lib64/ --enable-
> >> static -
> >>>> -disable-shared
> >>>>> --with-tm=/usr/ --without-threads CC=pgcc CXX=pgCC FC=pgf90
> >>>> F77=pgf77> LDFLAGS=-L/usr/lib64/torque/
> >>>>>
> >>>>> However, when I submit jobs for 2 or more nodes through the 
> torque>>>>> schedular, the jobs just hang here. It shows the RUN 
> state, but no
> >>>>> communication between the nodes, then jobs will die with 
> timeout.>>>>>
> >>>>> We have comfirmed that the myrinet is working because our lam-
> >> mpi-
> >>>> 7.1 works
> >>>>> just fine. We are having a really hard time determining what are
> >>>> the causes
> >>>>> for this problem. So, we suspect it's because our torque is too
> >> old.>>>
> >>>>> What is the lowest version requirement of torque for open-mpi-
> >>>> 1.3.2? The
> >>>>> README file didn't specify this detail. Does anyone know more
> >>>> about it?
> >>>>>
> >>>>> Thanks in advance,
> >>>>>
> >>>>> Kai
> >>>>> --------------------
> >>>>> Kai Song
> >>>>> <ks...@lbl.gov> 1.510.486.4894
> >>>>> High Performance Computing Services (HPCS) Intern
> >>>>> Lawrence Berkeley National Laboratory - http://scs.lbl.gov
> >>>>>
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> us...@open-mpi.org
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>
> >>>>
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Open-MPI-1.3.2 compatibility with old torque?

Reply via email to