Rangam, It does not want to run at all. Attached is the log file from the batch file run u sent.
On Sat, Nov 20, 2010 at 10:32 AM, Addepalli, Srirangam V < srirangam.v.addepa...@ttu.edu> wrote: > Hello Tushar, > MPIRUN is not able to spawn processes on the node allocated. This should > help > > #!/bin/sh > #PBS -V > #PBS -q wasatch > #PBS -N SWMF > #PBS -l nodes=2:ppn=8 > # change to the run directory > #cd $SWMF_v2.3/run > cat `echo ${PBS_NODEFILE}` > list_of_nodes > mpirun -np 8 /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log > > > Rangam > > > ________________________________________ > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of > Tushar Andriyas [thugnomic...@gmail.com] > Sent: Saturday, November 20, 2010 10:48 AM > To: Open MPI Users > Subject: Re: [OMPI users] Unable to find the following executable > > Hi Rangam, > > I ran the batch file that you gave and have attached the error file. Also, > since the WASATCH cluster is kind of small, people usually run on UINTA. So, > if possible could you look at the uinta error files? > Tushar > > On Fri, Nov 19, 2010 at 12:31 PM, Addepalli, Srirangam V < > srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu>> > wrote: > Hello Tushar, > After looking at the log files you attached it appears that there are > multiple issues. > > [0,1,11]: Myrinet/GM on host wasatch-55 was unable to find any NICs. > Another transport will be used instead, although this may result in > lower performance. > > Usually they occur if there is a mismatch in mpirun version and mca blt > selection. I suggest the following order to check if the job actually works > on a single node > > #!/bin/sh > #PBS -V > #PBS -q wasatch > #PBS -N SWMF > #PBS -l nodes=2:ppn=8 > # change to the run directory > #cd $SWMF_v2.3/run > cat `echo ${PBS_NODEFILE}` > list_of_nodes > mpirun -np 8 -machinefile list_of_nodes > /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log > > > Rangam > > > ________________________________________ > From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> [ > users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] On Behalf > Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>] > Sent: Friday, November 19, 2010 1:11 PM > To: Open MPI Users > Subject: Re: [OMPI users] Unable to find the following executable > > Hey Rangam, > > I tried out the batch script and the error file comes out empty and the > output file has /home/A00945081/SWM_v2.3/run/SWMF.exe (WHEN RUN ON A SINGLE > MACHINE) and the same with multiple machines in the run. So, does that mean > that the exe is auto mounted ? What should I do next? > > Tushar > > On Fri, Nov 19, 2010 at 10:05 AM, Addepalli, Srirangam V < > srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu > ><mailto:srirangam.v.addepa...@ttu.edu<mailto: > srirangam.v.addepa...@ttu.edu>>> wrote: > Hello Tushar, > > Try the following script. > > #!/bin/sh > #PBS -V > #PBS -q wasatch > #PBS -N SWMF > #PBS -l nodes=1:ppn=8 > # change to the run directory > #cd $SWMF_v2.3/run > cat `echo ${PBS_NODEFILE}` > list_of_nodes > > > > > The objective is to check if your user directories are auto mounted on > compute nodes and are available during run time. > > If the job returns information about SWMF.exe then it can be safely assumed > that user directories are being auto mounted. > > Rangam > > > > ________________________________________ > From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org > ><mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>> [ > users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org><mailto: > users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>>] On Behalf > Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com > ><mailto:thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>>] > Sent: Friday, November 19, 2010 8:35 AM > To: Open MPI Users > Subject: Re: [OMPI users] Unable to find the following executable > > It just gives back the info on folders in my home directory. Dont get me > wrong but i m kinda new in this. So, could u type out d full command which i > need to give? > > Tushar > > On Thu, Nov 18, 2010 at 8:35 AM, Ralph Castain <r...@open-mpi.org<mailto: > r...@open-mpi.org><mailto:r...@open-mpi.org<mailto:r...@open-mpi.org > >><mailto:r...@open-mpi.org<mailto:r...@open-mpi.org><mailto: > r...@open-mpi.org<mailto:r...@open-mpi.org>>>> wrote: > You can qsub a simple "ls" on that path - that will tell you if the path is > valid on all machines in that allocation. > > What typically happens is that home directories aren't remotely mounted, or > are mounted on a different location. > > > On Thu, Nov 18, 2010 at 8:31 AM, Tushar Andriyas <thugnomic...@gmail.com > <mailto:thugnomic...@gmail.com><mailto:thugnomic...@gmail.com<mailto: > thugnomic...@gmail.com>><mailto:thugnomic...@gmail.com<mailto: > thugnomic...@gmail.com><mailto:thugnomic...@gmail.com<mailto: > thugnomic...@gmail.com>>>> wrote: > no its not in the same directory as SWMF. I guess the path is the same > since all the machines in a cluster are configured d same way. How do I know > if this is not the case? > > > On Thu, Nov 18, 2010 at 8:25 AM, Ralph Castain <r...@open-mpi.org<mailto: > r...@open-mpi.org><mailto:r...@open-mpi.org<mailto:r...@open-mpi.org > >><mailto:r...@open-mpi.org<mailto:r...@open-mpi.org><mailto: > r...@open-mpi.org<mailto:r...@open-mpi.org>>>> wrote: > Is you "hello world" test program in the same directory as SWMF? Is it > possible that the path you are specifying is not available on all of the > remote machines? That's the most common problem we see. > > > On Thu, Nov 18, 2010 at 7:59 AM, Tushar Andriyas <thugnomic...@gmail.com > <mailto:thugnomic...@gmail.com><mailto:thugnomic...@gmail.com<mailto: > thugnomic...@gmail.com>><mailto:thugnomic...@gmail.com<mailto: > thugnomic...@gmail.com><mailto:thugnomic...@gmail.com<mailto: > thugnomic...@gmail.com>>>> wrote: > Hi there, > > Thanks for the expedite reply. The thing is that although the mpirun is > setup correctly (since a simple hello world works), when I run the main > SWMF.exe executable, the cluster machines somehow fail to find the > executable (SWMF.exe). > > So, I have attached the sample error file from one of the runs > (SWMF.e143438) and also the MAKEFILES so that you could better gauge the > problem. The makefiles have Linux as the OS and pgf90 as compiler with > mpif90 as the linker. I am using openmpi-1.2.7-pgi. Job is submitted using a > batch file (job.bats) and the scheduler is Torque (version I am not sure but > I can see three on the machines viz 2.0.0, 2.2.1, 2.5.2). > > I have also attached an error file from one of the clusters (WASATCH viz > SWMF.e143439) and UINTA (SWMF.e143440) with the whole path of the exe as > Srirangam mentioned as follows (in the batch file). > > mpirun --prefix /opt/libraries/openmpi/openmpi-1.2.7-pgi > /home/A00945081/SWMF_v2.3/run/SWMF.exe > runlog_`date +%y%m%d%H%M` > > I have tried both mpirun and mpiexec but nothing seems to work. > > Tushar > > > On Wed, Nov 17, 2010 at 8:12 PM, Addepalli, Srirangam V < > srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu > ><mailto:srirangam.v.addepa...@ttu.edu<mailto: > srirangam.v.addepa...@ttu.edu>><mailto:srirangam.v.addepa...@ttu.edu > <mailto:srirangam.v.addepa...@ttu.edu><mailto: > srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu>>>> > wrote: > Hello Tushar, > Have you tried supplying the full path of the executable just to check ? > Rangam > ________________________________________ > From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org > ><mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org > >><mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org > ><mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>>> [ > users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org><mailto: > users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>><mailto: > users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org><mailto: > users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>>>] On Behalf > Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com > ><mailto:thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>><mailto: > thugnomic...@gmail.com<mailto:thugnomic...@gmail.com><mailto: > thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>>>] > Sent: Wednesday, November 17, 2010 8:49 PM > To: us...@open-mpi.org<mailto:us...@open-mpi.org><mailto: > us...@open-mpi.org<mailto:us...@open-mpi.org>><mailto:us...@open-mpi.org > <mailto:us...@open-mpi.org><mailto:us...@open-mpi.org<mailto: > us...@open-mpi.org>>> > Subject: [OMPI users] Unable to find the following executable > > Hi there, > > I am new to using mpi commands and was stuck in problem with running a > code. When I submit my job through a batch file, the job exits with the > message that the executable could not be found on the machines. I have tried > a lot of options such as PBS -V and so on on but the problem persists. If > someone is interested, I can send the full info on the cluster, the compiler > and openmpi settings and other stuff. BTW the launcher is torque (which you > might have guessed). The code does not have a forum so I am in a deep mire. > > Thanks, > Tushar > > _______________________________________________ > users mailing list > us...@open-mpi.org<mailto:us...@open-mpi.org><mailto:us...@open-mpi.org > <mailto:us...@open-mpi.org>><mailto:us...@open-mpi.org<mailto: > us...@open-mpi.org><mailto:us...@open-mpi.org<mailto:us...@open-mpi.org>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org<mailto:us...@open-mpi.org><mailto:us...@open-mpi.org > <mailto:us...@open-mpi.org>><mailto:us...@open-mpi.org<mailto: > us...@open-mpi.org><mailto:us...@open-mpi.org<mailto:us...@open-mpi.org>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org<mailto:us...@open-mpi.org><mailto:us...@open-mpi.org > <mailto:us...@open-mpi.org>><mailto:us...@open-mpi.org<mailto: > us...@open-mpi.org><mailto:us...@open-mpi.org<mailto:us...@open-mpi.org>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org<mailto:us...@open-mpi.org><mailto:us...@open-mpi.org > <mailto:us...@open-mpi.org>><mailto:us...@open-mpi.org<mailto: > us...@open-mpi.org><mailto:us...@open-mpi.org<mailto:us...@open-mpi.org>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org<mailto:us...@open-mpi.org><mailto:us...@open-mpi.org > <mailto:us...@open-mpi.org>><mailto:us...@open-mpi.org<mailto: > us...@open-mpi.org><mailto:us...@open-mpi.org<mailto:us...@open-mpi.org>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org<mailto:us...@open-mpi.org><mailto:us...@open-mpi.org > <mailto:us...@open-mpi.org>> > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org<mailto:us...@open-mpi.org> > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
SWMF.e143737
Description: Binary data