Re: [OMPI users] Error launching single-node tasks from multiple-node job.

Lee-Ping Wang Sat, 10 Aug 2013 19:40:58 -0400 (EDT)

Hi Ralph,

Thank you.  I didn't know that "--without-tm" was the correct configure
option.  I built and reinstalled OpenMPI 1.4.2, and now I no longer need to
set PBS_JOBID for it to recognize the correct machine file.  My current
workflow is:


1) Submit a multiple-node batch job. 
2) Launch a separate process on each node with "pbsdsh".
2) On each node, create a file called
/scratch/leeping/pbs_nodefile.$HOSTNAME which contains 24 instances of the
hostname (since there are 24 cores).
3) Set $PBS_NODEFILE=/scratch/leeping/pbs_nodefile.$HOSTNAME.
4) In the Q-Chem wrapper script, make sure mpirun is called with the command
line argument: -machinefile $PBS_NODEFILE

Everything seems to work, thanks to your help and Gus.  I might report back
if the jobs fail halfway through or if there is no speedup, but for now
everything seems to be in place.

- Lee-Ping

-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Saturday, August 10, 2013 4:28 PM
To: Open MPI Users
Subject: Re: [OMPI users] Error launching single-node tasks from
multiple-node job.

It helps if you use the correct configure option: --without-tm

Regardless, you can always deselect Torque support at runtime. Just put the
following in your environment:

OMPI_MCA_ras=^tm

That will tell ORTE to ignore the Torque allocation module and it should
then look at the machinefile.


On Aug 10, 2013, at 4:18 PM, "Lee-Ping Wang" <leep...@stanford.edu> wrote:

> Hi Gus,
> 
> I agree that $PBS_JOBID should not point to a file in normal 
> situations, because it is the job identifier given by the scheduler.  
> However, ras_tm_module.c actually does search for a file named 
> $PBS_JOBID, and that seems to be why it was failing.  You can see this 
> in the source code as well (look at ras_tm_module.c, I uploaded it to 
> https://dl.dropboxusercontent.com/u/5381783/ras_tm_module.c ).  Once I 
> changed the $PBS_JOBID environment variable to the name of the node 
> file, things seemed to work - though I agree, it's not very logical.
> 
> I doubt Q-Chem is causing the issue, because I was able to "fix" 
> things by changing $PBS_JOBID before Q-Chem is called.  Also, I 
> provided the command line to mpirun in a previous email, where the 
> -machinefile argument correctly points to the custom machine file that 
> I created.  The missing environment variables should not matter.
> 
> The PBS_NODEFILE created by Torque is
> /opt/torque/aux//272139.certainty.stanford.edu and it never gets 
> touched.  I followed the advice in your earlier email and I created my 
> own node file on each node called 
> /scratch/leeping/pbs_nodefile.$HOSTNAME, and I set PBS_NODEFILE to 
> point to this file.  However, this file does not get used either, even 
> if I include it on the mpirun command line, unless I set PBS_JOBID to the
file name.
> 
> Finally, I was not able to build OpenMPI 1.4.2 without pbs support.  I 
> used the configure flag --without-rte-support, but the build failed 
> halfway through.
> 
> Thanks,
> 
> - Lee-Ping
> 
> leeping@certainty-a:~/temp$ qsub -I -q debug -l walltime=1:00:00 -l
> nodes=1:ppn=12
> qsub: waiting for job 272139.certainty.stanford.edu to start
> qsub: job 272139.certainty.stanford.edu ready
> 
> leeping@compute-140-4:~$ echo $PBS_NODEFILE 
> /opt/torque/aux//272139.certainty.stanford.edu
> 
> leeping@compute-140-4:~$ cat $PBS_NODEFILE
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> 
> leeping@compute-140-4:~$ echo $PBS_JOBID 272139.certainty.stanford.edu
> 
> leeping@compute-140-4:~$ cat $PBS_JOBID
> cat: 272139.certainty.stanford.edu: No such file or directory
> 
> leeping@compute-140-4:~$ env | grep PBS
> PBS_VERSION=TORQUE-2.5.3
> PBS_JOBNAME=STDIN
> PBS_ENVIRONMENT=PBS_INTERACTIVE
> PBS_O_WORKDIR=/home/leeping/temp
> PBS_TASKNUM=1
> PBS_O_HOME=/home/leeping
> PBS_MOMPORT=15003
> PBS_O_QUEUE=debug
> PBS_O_LOGNAME=leeping
> PBS_O_LANG=en_US.iso885915
> PBS_JOBCOOKIE=A27B00DAF72024CBEBB7CD3752BDBADC
> PBS_NODENUM=0
> PBS_NUM_NODES=1
> PBS_O_SHELL=/bin/bash
> PBS_SERVER=certainty.stanford.edu
> PBS_JOBID=272139.certainty.stanford.edu
> PBS_O_HOST=certainty-a.local
> PBS_VNODENUM=0
> PBS_QUEUE=debug
> PBS_O_MAIL=/var/spool/mail/leeping
> PBS_NUM_PPN=12
> PBS_NODEFILE=/opt/torque/aux//272139.certainty.stanford.edu
> PBS_O_PATH=/opt/intel/Compiler/11.1/064/bin/intel64:/opt/intel/Compile
> r/11.1 
> /064/bin/intel64:/usr/local/cuda/bin:/home/leeping/opt/psi-4.0b5/bin:/
> home/l 
> eeping/opt/tinker/bin:/home/leeping/opt/cctools/bin:/home/leeping/bin:
> /home/ 
> leeping/local/bin:/home/leeping/opt/bin:/usr/kerberos/bin:/usr/java/la
> test/b 
> in:/usr/local/bin:/bin:/usr/bin:/opt/ganglia/bin:/opt/ganglia/sbin:/op
> t/open 
> mpi/bin/:/opt/maui/bin:/opt/torque/bin:/opt/torque/sbin:/opt/rocks/bin
> :/opt/ rocks/sbin:/opt/sun-ct/bin:/home/leeping/bin
> 
> -----Original Message-----
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gustavo 
> Correa
> Sent: Saturday, August 10, 2013 3:58 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] Error launching single-node tasks from 
> multiple-node job.
> 
> Lee-Ping
> 
> Something looks amiss.
> PBS_JOBID contains the job name.
> PBS_NODEFILE contains a list (with repetitions up to the number of 
> cores) of the nodes that torque assigned to the job.
> 
> Why things get twisted it is hard to tell, it may be something in the 
> Q-Chem scripts (could it be mixing up PBS_JOBID and PBS_NODEFILE?), it 
> may be something else.
> A more remote possibility is if the cluster has a Torque qsub wrapper 
> that may perhaps produce the aforementioned confusion.  Unlikely, but
possible.
> 
> To sort out, run any simple job (mpiexec -np 32 hostname), or even 
> your very Q-Chem job, but precede it with a bunch of printouts of the 
> PBS environment
> variables:
> echo $PBS_JOBID
> echo $PBS_NODEFILE
> ls -l $PBS_NODEFILE
> cat $PBS_NODEFILE
> cat $PBS_JOBID [this one should fail, because that is not a file, but 
> may work the PBS variables were messed up along the way]
> 
> I hope this helps,
> Gus Correa
> 
> On Aug 10, 2013, at 6:39 PM, Lee-Ping Wang wrote:
> 
>> Hi Gus,
>> 
>> It seems the calculation is now working, or at least it didn't crash.  
>> I set the PBS_JOBID environment variable to the name of my custom 
>> node file.  That is to say, I set
PBS_JOBID=pbs_nodefile.compute-3-3.local.
>> It appears that ras_tm_module.c is trying to open the file located at 
>> /scratch/leeping/$PBS_JOBID for some reason, and it is disregarding 
>> the machinefile argument on the command line.
>> 
>> It'll be a few hours before I know for sure whether the job actually
> worked.
>> I still don't know why things are structured this way, however. 
>> 
>> Thanks,
>> 
>> - Lee-Ping
>> 
>> -----Original Message-----
>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lee-Ping 
>> Wang
>> Sent: Saturday, August 10, 2013 3:07 PM
>> To: 'Open MPI Users'
>> Subject: Re: [OMPI users] Error launching single-node tasks from 
>> multiple-node job.
>> 
>> Hi Gus,
>> 
>> I tried your suggestions.  Here is the command line which executes
mpirun.
>> I was puzzled because it still reported a file open failure, so I 
>> inserted a print statement into ras_tm_module.c and recompiled.  The
> results are below.
>> As you can see, it tries to open a different file
>> (/scratch/leeping/272055.certainty.stanford.edu) than the one I 
>> specified (/scratch/leeping/pbs_nodefile.compute-3-3.local).
>> 
>> - Lee-Ping
>> 
>> === mpirun command line ===
>> /home/leeping/opt/openmpi-1.4.2-intel11-dbg/bin/mpirun -machinefile 
>> /scratch/leeping/pbs_nodefile.compute-3-3.local -x HOME -x PWD -x QC 
>> -x QCAUX -x QCCLEAN -x QCFILEPREF -x QCLOCALSCR -x QCPLATFORM -x 
>> QCREF -x QCRSH -x QCRUNNAME -x QCSCRATCH
>>                      -np 24 /home/leeping/opt/qchem40/exe/qcprog.exe
>> .B.in.28642.qcin.1 ./qchem28642/ >>B.out
>> 
>> === Error message from compute node === [compute-3-3.local:28666]
>> Warning: could not find environment variable "QCLOCALSCR"
>> [compute-3-3.local:28666] Warning: could not find environment 
>> variable "QCREF"
>> [compute-3-3.local:28666] Warning: could not find environment 
>> variable "QCRUNNAME"
>> Attempting to open /scratch/leeping/272055.certainty.stanford.edu
>> [compute-3-3.local:28666] [[56726,0],0] ORTE_ERROR_LOG: File open 
>> failure in file ras_tm_module.c at line 155 [compute-3-3.local:28666] 
>> [[56726,0],0]
>> ORTE_ERROR_LOG: File open failure in file ras_tm_module.c at line 87 
>> [compute-3-3.local:28666] [[56726,0],0] ORTE_ERROR_LOG: File open 
>> failure in file base/ras_base_allocate.c at line 133 
>> [compute-3-3.local:28666] [[56726,0],0] ORTE_ERROR_LOG: File open 
>> failure in file base/plm_base_launch_support.c at line 72 
>> [compute-3-3.local:28666] [[56726,0],0] ORTE_ERROR_LOG: File open 
>> failure in file plm_tm_module.c at line 167
>> 
>> -----Original Message-----
>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lee-Ping 
>> Wang
>> Sent: Saturday, August 10, 2013 12:51 PM
>> To: 'Open MPI Users'
>> Subject: Re: [OMPI users] Error launching single-node tasks from 
>> multiple-node job.
>> 
>> Hi Gus,
>> 
>> Thank you.  You gave me many helpful suggestions, which I will try 
>> out and get back to you.  I will provide more specifics (e.g. how my 
>> jobs were
>> submitted) in a future email.  
>> 
>> As for the queue policy, that is a highly political issue because the 
>> cluster is a shared resource.  My usual recourse is to use the batch 
>> system as effectively as possible within the confines of their 
>> policies.  This is why it makes sense to submit a single 
>> multiple-node batch job, which then executes several independent
single-node tasks.
>> 
>> - Lee-Ping
>> 
>> -----Original Message-----
>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gustavo 
>> Correa
>> Sent: Saturday, August 10, 2013 12:39 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] Error launching single-node tasks from 
>> multiple-node job.
>> 
>> Hi Lee-Ping
>> On Aug 10, 2013, at 3:15 PM, Lee-Ping Wang wrote:
>> 
>>> Hi Gus,
>>> 
>>> Thank you for your reply.  I want to run MPI jobs inside a single 
>>> node, but due to the resource allocation policies on the clusters, I 
>>> could get many more resources if I submit multiple-node "batch jobs".
>>> Once I have a multiple-node batch job, then I can use a command like 
>>> "pbsdsh" to run single node MPI jobs on each node that is allocated 
>>> to me.  Thus, the MPI jobs on each node are running independently of 
>>> each other and unaware of one another.
>> 
>> Even if you use pbdsh to launch separate MPI jobs on individual 
>> nodes, you probably (not 100% sure about that), probably need to 
>> specify he -hostfile naming the specific node that each job will run on.
>> 
>> Still quite confused because you didn't tell how your "qsub" command 
>> looks like, what Torque script (if any) it is launching, etc.
>> 
>>> 
>>> The actual call to mpirun is nontrivial to get, because Q-Chem has a 
>>> complicated series of wrapper scripts which ultimately calls mpirun.
>> 
>> Yes, I just found this out on the Web.  See my previous email.
>> 
>>> If the
>>> jobs are failing immediately, then I only have a small window to 
>>> view the actual command through "ps" or something.
>>> 
>> 
>> Are you launching the jobs interactively?  
>> I.e., with the -I switch to qsub?
>> 
>> 
>>> Another option is for me to compile OpenMPI without Torque / PBS
support.
>>> If I do that, then it won't look for the node file anymore.  Is this 
>>> correct?
>> 
>> You will need to tell mpiexec where to launch the jobs.
>> If I understand what you are trying to achieve (and I am not sure I 
>> do), one way to do it would be to programatically split the 
>> $PBS_NODEFILE into several hostfiles, one per MPI job (so to speak) 
>> that
> you want to launch.
>> Then use each of these nodefiles for each of the MPI jobs.
>> Note that the PBS_NODEFILE has one line per-node-per-core, *not* one 
>> line per node.
>> I have no idea how the trick above could be reconciled with the 
>> Q-Chem scripts, though.
>> 
>> Overall, I don't understand why you would benefit from such a 
>> complicated scheme, rather than lauching either a big MPI job across 
>> all nodes that you requested (if the problem is large enough to 
>> benefit from  this many cores), or launch several small single-node 
>> jobs (if the problem is small enough to fit well a single node).
>> 
>> You may want to talk to the cluster managers, because there must be a 
>> way to reconcile their queue policies with your needs (if this not 
>> already in place).
>> We run tons of parallel single-node jobs here, for problems that fit 
>> well a single node.
>> 
>> 
>> My two cents
>> Gus Correa
>> 
>>> 
>>> I will try your suggestions and get back to you.  Thanks!
>>> 
>>> - Lee-Ping
>>> 
>>> -----Original Message-----
>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gustavo
>> Correa
>>> Sent: Saturday, August 10, 2013 12:04 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] Error launching single-node tasks from 
>>> multiple-node job.
>>> 
>>> Hi Lee-Ping
>>> 
>>> I know nothing about Q-Chem, but I was confused by these sentences:
>>> 
>>> "That is to say, these tasks are intended to use OpenMPI parallelism 
>>> on
>> each
>>> node, but no parallelism across nodes. "
>>> 
>>> "I do not observe this error when submitting single-node jobs."
>>> 
>>> "Since my jobs are only parallel over the node they're running on, I
>> believe
>>> that a node file of any kind is unnecessary. "
>>> 
>>> Are you trying to run MPI jobs across several nodes or inside a 
>>> single
>> node?
>>> 
>>> ***
>>> 
>>> Anyway, as far as I know,
>>> if your OpenMPI was compiled with Torque/PBS support, the 
>>> mpiexec/mpirun command will look for the $PBS_NODEFILE to learn in 
>>> which node(s) it
>> should
>>> launch the MPI processes, regardless of whether you are using one 
>>> node or more than one node.
>>> 
>>> You didn't send your mpiexec command line (which would help), but 
>>> assuming that Q-Chem allows some level of standard mpiexec command 
>>> options, you
>> could
>>> force passing the $PBS_NODEFILE to it.
>>> 
>>> Something like this (for two nodes with 8 cores each):
>>> 
>>> #PBS -q myqueue
>>> #PBS -l nodes=2:ppn=8
>>> #PBS -N myjob
>>> cd $PBS_O_WORKDIR
>>> ls -l $PBS_NODEFILE
>>> cat $PBS_NODEFILE
>>> 
>>> mpiexec -hostfile $PBS_NODEFILE -np 16 ./my-Q-chem-executable 
>>> <parameters
>> to
>>> Q-chem>
>>> 
>>> I hope this helps,
>>> Gus Correa
>>> 
>>> On Aug 10, 2013, at 1:51 PM, Lee-Ping Wang wrote:
>>> 
>>>> Hi there,
>>>> 
>>>> Recently, I've begun some calculations on a cluster where I submit 
>>>> a
>>> multiple node job to the Torque batch system, and the job executes
>> multiple
>>> single-node parallel tasks.  That is to say, these tasks are 
>>> intended to
>> use
>>> OpenMPI parallelism on each node, but no parallelism across nodes. 
>>>> 
>>>> Some background: The actual program being executed is Q-Chem 4.0.  
>>>> I use
>>> OpenMPI 1.4.2 for this, because Q-Chem is notoriously difficult to 
>>> compile and this is the last known version of OpenMPI that this 
>>> version of Q-Chem
>> is
>>> known to work with.
>>>> 
>>>> My jobs are failing with the error message below; I do not observe 
>>>> this
>>> error when submitting single-node jobs.  From reading the mailing 
>>> list archives
>> (http://www.open-mpi.org/community/lists/users/2010/03/12348.php),
>>> I believe it is looking for a PBS node file somewhere.  Since my 
>>> jobs are only parallel over the node they're running on, I believe 
>>> that a node file of any kind is unnecessary.
>>>> 
>>>> My question is: Why is OpenMPI behaving differently when I submit a
>>> multi-node job compared to a single-node job?  How does OpenMPI 
>>> detect
>> that
>>> it is running under a multi-node allocation?  Is there a way I can 
>>> change OpenMPI's behavior so it always thinks it's running on a 
>>> single node, regardless of the type of job I submit to the batch system?
>>>> 
>>>> Thank you,
>>>> 
>>>> -          Lee-Ping Wang (Postdoc in Dept. of Chemistry, Stanford
>>> University)
>>>> 
>>>> [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open 
>>>> failure in file ras_tm_module.c at line 153 
>>>> [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open 
>>>> failure in file ras_tm_module.c at line 153 
>>>> [compute-1-1.local:10911] [[42011,0],0]
>>>> ORTE_ERROR_LOG: File open failure in file ras_tm_module.c at line
>>>> 153 [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File 
>>>> open failure in file ras_tm_module.c at line 87 
>>>> [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open 
>>>> failure in file ras_tm_module.c at line 87 
>>>> [compute-1-1.local:10911] [[42011,0],0]
>>>> ORTE_ERROR_LOG: File open failure in file ras_tm_module.c at line 
>>>> 87 [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File 
>>>> open failure in file base/ras_base_allocate.c at line 133 
>>>> [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open 
>>>> failure in file base/ras_base_allocate.c at line 133 
>>>> [compute-1-1.local:10911] [[42011,0],0] ORTE_ERROR_LOG: File open 
>>>> failure in file base/ras_base_allocate.c at line 133 
>>>> [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open 
>>>> failure in file base/plm_base_launch_support.c at line 72 
>>>> [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open 
>>>> failure in file base/plm_base_launch_support.c at line 72 
>>>> [compute-1-1.local:10911] [[42011,0],0] ORTE_ERROR_LOG: File open 
>>>> failure in file base/plm_base_launch_support.c at line 72 
>>>> [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open 
>>>> failure in file plm_tm_module.c at line 167 
>>>> [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open 
>>>> failure in file plm_tm_module.c at line 167 
>>>> [compute-1-1.local:10911] [[42011,0],0]
>>>> ORTE_ERROR_LOG: File open failure in file plm_tm_module.c at line
>>>> 167 _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Error launching single-node tasks from multiple-node job.

Reply via email to