Torque problem

Reuti Sun, 3 Apr 2011 11:12:27 -0400

Am 03.04.2011 um 16:56 schrieb Ralph Castain:

> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
> 
>> Let me expand on this slightly (in response to Ralph Castain's posting
>> -- I had digest mode set). As currently constructed a shellscript in
>> Wien2k (www.wien2k.at) launches a series of tasks using
>> 
>> ($remote $remotemachine "cd $PWD;$t $ttt;rm -f .lock_$lockfile[$p]")
>>>> .time1_$loop &
>> 
>> where the standard setting for "remote" is "ssh", remotemachine is the
>> appropriate host, "t" is "time" and "ttt" is a concatenation of
>> commands, for instance when using 2 cores on one node for Task1, 2
>> cores on 2 nodes for Task2 and 2 cores on 1 node for Task3
>> 
>> Task1:
>> mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machine1
>> /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def
>> Task2:
>> mpirun -v -x LD_LIBRARY_PATH -x PATH -np 4 -machinefile .machine2
>> /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_2.def
>> Task3:
>> mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machine3
>> /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_3.def
>> 
>> This is a stable script, works under SGI, linux, mvapich and many
>> others using ssh or rsh (although I've never myself used it with rsh).
>> It is general purpose, i.e. will work to run just 1 task on 8x8
>> nodes/cores or 8 parallel tasks on 8 nodes all with 8 cores or any
>> scatter of nodes/cores.
>> 
>> According to some, ssh is becoming obsolete within supercomputers and
>> the "replacement" is pbsdsh at least under Torque.
> 
> Somebody is playing an April Fools joke on you. The majority of 
> supercomputers use ssh as their sole launch mechanism, and I have seen no
> indication that anyone intends to change that situation. That said, Torque is 
> certainly popular and a good environment.


I operate my Linux clusters without `ssh` or `rsh`. I use SGE's `qrsh` instead. 
How will you get a tight integration with correct accounting and job control 
otherwise? This might be different when you have an AIX or NEC SX machine, as 
they provide additonal control mechanisms.

-- Reuti


>> Getting pbsdsh is
>> certainly not as simple as the documentation I've seen. To get it to
>> even partially work I am using for "remote" a script "pbsh" which
>> creates an executable bash file where HOME, PATH, LD_LIBRARY_PATH etc
>> as well as the PBS environmental variables listed at the bottom of
>> http://www.bear.bham.ac.uk/bluebear/pbsdsh.shtml plus PBS_NODEFILE to
>> a file $PBS_O_WORKDIR/.tmp_$1 followed by the relevant command and
>> then runs
>> 
>> pbsdsh -h $1 /bin/bash -lc " $PBS_O_WORKDIR/.tmp_$1  "
>> 
>> This works fine so long as Task2 does not have 2 nodes (probably 3 as
>> well, I've not tested this). If it does there is a communications
>> failure with nothing launched on the 2nd node of Task2.
>> 
>> I'm including the script below, as maybe there are some other
>> environmental variables needed or some should not be there in order to
>> properly rebuilt the environment so things will work. (And yes, I know
>> there should be tests to see if the variables are set first and so
>> forth and this is not so clean, this is just an initial version.)
> 
> By providing all those PBS-related envars to OMPI, you are causing OMPI to 
> think it should use Torque as the launch mechanism. Unfortunately, that won't 
> work in this scenario.
> 
> When you start a Torque job (get an allocation etc.), Torque puts you on one 
> of the allocated nodes and creates a "sister mom" on that node. This is your 
> job's "master node". All Torque-based launches must take place from that 
> location.
> 
> So when you pbsdsh to another node and attempt to execute mpirun with those 
> envars set, mpirun attempts to contact the local "sister mom" so it can order 
> the launch of any daemons on other nodes....only the "sister mom" isn't 
> there! So the connection fails and mpirun aborts.
> 
> If mpirun is -only- launching procs on the local node, then it doesn't need 
> to launch another daemon (as mpirun will host the local procs itself), and so 
> it doesn't attempt to contact the "sister mom" and the comm failure doesn't 
> occur.
> 
> What I still don't understand is why you are trying to do it this way. Why 
> not just run
> 
> time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN 
> /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def
> 
> where machineN contains the names of the nodes where you want the MPI apps to 
> execute? mpirun will only execute apps on those nodes, so this accomplishes 
> the same thing as your script - only with a lot less pain.
> 
> Your script would just contain a sequence of these commands, each with its 
> number of procs and machinefile as required.
> 
> Actually, it would be pretty much identical to the script I use when doing 
> scaling tests...
> 
> 
>> 
>> ----------
>> # Script to replace ssh by pbsdsh
>> # Beta version, April 2011, L. D. Marks
>> #
>> # Remove old file -- needed !
>> rm -f $PBS_O_WORKDIR/.tmp_$1
>> 
>> # Create a script that exports the environment we have
>> # This may not be enough
>> echo #!/bin/bash > $PBS_O_WORKDIR/.tmp_$1
>> echo source $HOME/.bashrc                       >> $PBS_O_WORKDIR/.tmp_$1
>> echo cd $PBS_O_WORKDIR                          >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PATH=$PBS_O_PATH                    >> $PBS_O_WORKDIR/.tmp_$1
>> echo export TMPDIR=$TMPDIR                      >> $PBS_O_WORKDIR/.tmp_$1
>> echo export SCRATCH=$SCRATCH                    >> $PBS_O_WORKDIR/.tmp_$1
>> echo export LD_LIBRARY_PATH=$LD_LIBRARY_PATH    >> $PBS_O_WORKDIR/.tmp_$1
>> 
>> # Openmpi needs to have this defined, even if we don't use it
>> echo export PBS_NODEFILE=$PBS_NODEFILE >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_ENVIRONMENT=$PBS_ENVIRONMENT    >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_JOBCOOKIE=$PBS_JOBCOOKIE        >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_JOBID=$PBS_JOBID                >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_JOBNAME=$PBS_JOBNAME            >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_MOMPORT=$PBS_MOMPORT            >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_NODENUM=$PBS_NODENUM            >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_O_HOME=$PBS_O_HOME              >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_O_HOST=$PBS_O_HOST              >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_O_LANG=$PBS_O_LANG              >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_O_LOGNAME=$PBS_O_LOGNAME        >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_O_MAIL=$PBS_O_MAIL              >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_O_PATH=$PBS_O_PATH              >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_O_QUEUE=$PBS_O_QUEUE            >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_O_SHELL=$PBS_O_SHELL            >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_O_WORKDIR=$PBS_O_WORKDIR        >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_QUEUE=$PBS_QUEUE                >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_TASKNUM=$PBS_TASKNUM            >> $PBS_O_WORKDIR/.tmp_$1
>> echo export PBS_VNODENUM=$PBS_VNODENUM          >> $PBS_O_WORKDIR/.tmp_$1
>> 
>> # Now the command we want to run
>> echo $2 >> $PBS_O_WORKDIR/.tmp_$1
>> 
>> # Make it executable
>> chmod a+x $PBS_O_WORKDIR/.tmp_$1
>> 
>> pbsdsh -h $1 /bin/bash -lc " $PBS_O_WORKDIR/.tmp_$1  "
>> 
>> #Cleanup if needed (commented out for debugging)
>> #rm $PBS_O_WORKDIR/.tmp_$1
>> 
>> 
>> On Sat, Apr 2, 2011 at 9:36 PM, Laurence Marks <l-ma...@northwestern.edu> 
>> wrote:
>>> I have a problem which may or may not be openmpi, but since this list
>>> was useful before with a race condition I am posting.
>>> 
>>> I am trying to use pbsdsh as a ssh replacement, pushed by sysadmins as
>>> Torque does not know about ssh tasks launched from a task. In a simple
>>> case, a script launches three mpi tasks in parallel,
>>> 
>>> Task1: NodeA
>>> Task2: NodeB and NodeC
>>> Task3: NodeD
>>> 
>>> (some cores on each, all handled correctly). Reproducible (but with
>>> different nodes and numbers of cores) Task1 and Task3 work fine, the
>>> mpi task starts on NodeB but nothing starts on NodeC, it appears that
>>> NodeC does not communicate. It does not have to be this it could be
>>> 
>>> Task1: NodeA NodeB
>>> Task2: NodeC NodeD
>>> 
>>> Here NodeC will start and it looks as if NodeD never starts anything.
>>> I've also run it with 4 Tasks (1,3,4 work) and if Task2 only uses one
>>> Node (number of cores do not matter) it is fine.
>>> 
>>> --
>>> Laurence Marks
>>> Department of Materials Science and Engineering
>>> MSE Rm 2036 Cook Hall
>>> 2220 N Campus Drive
>>> Northwestern University
>>> Evanston, IL 60208, USA
>>> Tel: (847) 491-3996 Fax: (847) 491-7820
>>> email: L-marks at northwestern dot edu
>>> Web: www.numis.northwestern.edu
>>> Chair, Commission on Electron Crystallography of IUCR
>>> www.numis.northwestern.edu/
>>> Research is to see what everybody else has seen, and to think what
>>> nobody else has thought
>>> Albert Szent-Györgi
>>> 
>> 
>> 
>> 
>> -- 
>> Laurence Marks
>> Department of Materials Science and Engineering
>> MSE Rm 2036 Cook Hall
>> 2220 N Campus Drive
>> Northwestern University
>> Evanston, IL 60208, USA
>> Tel: (847) 491-3996 Fax: (847) 491-7820
>> email: L-marks at northwestern dot edu
>> Web: www.numis.northwestern.edu
>> Chair, Commission on Electron Crystallography of IUCR
>> www.numis.northwestern.edu/
>> Research is to see what everybody else has seen, and to think what
>> nobody else has thought
>> Albert Szent-Györgi
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] openmpi/pbsdsh/Torque problem

Reply via email to