Thanks. I will test this tomorrow. Many people run Wien2k with openmpi as you say, I only became aware of the issue of Wien2k (and perhaps other codes) leaving orphaned processes still running a few days ago. I also know someone who wants to run Wien2k on a system where both rsh and ssh are banned. Personally, as I don't want to be banned from the supercomputers I use I want to find a adequate patch for myself --- and then try and persuade the developers to adopt it.
On Sun, Apr 3, 2011 at 6:13 PM, Ralph Castain <r...@open-mpi.org> wrote: > > On Apr 3, 2011, at 4:37 PM, Laurence Marks wrote: > >> On Sun, Apr 3, 2011 at 5:08 PM, Reuti <re...@staff.uni-marburg.de> wrote: >>> Am 03.04.2011 um 23:59 schrieb David Singleton: >>> >>>> On 04/04/2011 12:56 AM, Ralph Castain wrote: >>>>> >>>>> What I still don't understand is why you are trying to do it this way. >>>>> Why not just run >>>>> >>>>> time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN >>>>> /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def >>>>> >>>>> where machineN contains the names of the nodes where you want the MPI >>>>> apps to execute? mpirun will only execute apps on those nodes, so this >>>>> accomplishes the same thing as your script - only with a lot less pain. >>>>> >>>>> Your script would just contain a sequence of these commands, each with >>>>> its number of procs and machinefile as required. >>>>> >>>> >>>> Maybe I missed why this suggestion of forgetting about the ssh/pbsdsh >>>> altogether >>>> was not feasible? Just use mpirun (with its great tm support!) to >>>> distribute >>>> MPI jobs. >>> >>> Wien2k has a two stage startup, e.g. for 16 slots: >>> >>> a) start 4 times `ssh` in the background to go to some of the granted nodes >>> b) use there on each node `mpirun` to start processes on the remaining >>> nodes, 3 for each call >>> >>> Problems: >>> >>> 1) control `ssh` under Torque >>> 2) provide a partially hostlist to `mpirun`, maybe by disabling the default >>> tight integration >>> >>> -- Reuti >>> >> >> 1) The mpi tasks can be started on only one node (Reuti, "setenv >> MPI_REMOTE 0" in parallel_options which was introduced for other >> reasons in 9.3 and later releases). That seems to be safe and maybe >> the only viable method with OMPI as pbsdsh appears to be unable to >> launch mpi tasks correctly (or needs some environmental variables that >> I don't know about). >> 2) This is already done (Reuti, this is .machine0, .machine1 etc. If >> you need information about setting up the Wien2k file under qsub in >> general, contact me offline or look for Machines2W on the mailing list >> which may be part of the next release, I'm not sure and I don't make >> those decisions). >> >> However, there is another layer that Ruedi did not mention for this >> code which is that some processes also need to be remotely launched to >> ensure that the correct scratch directories are used (i.e. local >> storage which is faster rather than nfs or similar). Maybe pbsdsh can >> be used for this, I am still testing and I am not sure. It may be >> enough to create a script with all important environmental variables >> exported (as they may not all be in .bashrc or .cshrc) although there >> might be issues making this fully portable. Since there are > 1000 >> licenses of Wien2k, it has to be able to cope with different OS's, and >> not just OMPI. >> > > Here is what I would do, based on my knowledge of OMPI's internals (and I > wrote the launchers :-)): > > 1. do not use your script - you don't want all those PBS envars to confuse > OMPI > > 2. mpirun -mca plm rsh -launch-agent pbsdsh -mca ras ^tm --machinefile m1.... > > This cmd line tells mpirun to use the "rsh/ssh" launcher, but to substitute > "pbsdsh" for "ssh". It also tells it to ignore the PBS_NODEFILE and just use > the machinefile for the nodes to be used for that job. > > I can't swear this will work as I have never verified that pbsdsh and ssh > have the same syntax, but I -think- that was true. If so, then this might do > what you are attempting. > > > I know people have run Wien2k with OMPI before - but I have never heard of > the problems you are reporting. > > >>> >>>> A simple example: >>>> >>>> vayu1:~/MPI > qsub -lncpus=24,vmem=24gb,walltime=10:00 -wd -I >>>> qsub: waiting for job 574900.vu-pbs to start >>>> qsub: job 574900.vu-pbs ready >>>> >>>> [dbs900@v250 ~/MPI]$ wc -l $PBS_NODEFILE >>>> 24 >>>> [dbs900@v250 ~/MPI]$ head -12 $PBS_NODEFILE > m1 >>>> [dbs900@v250 ~/MPI]$ tail -12 $PBS_NODEFILE > m2 >>>> [dbs900@v250 ~/MPI]$ mpirun --machinefile m1 ./a2a143 120000 30 & mpirun >>>> --machinefile m2 ./pp143 >>>> >>>> >>>> Check how the processes are distributed ... >>>> >>>> vayu1:~ > qps 574900.vu-pbs >>>> Node 0: v250: >>>> PID S RSS VSZ %MEM TIME %CPU COMMAND >>>> 11420 S 2104 10396 0.0 00:00:00 0.0 -tcsh >>>> 11421 S 620 10552 0.0 00:00:00 0.0 pbs_demux >>>> 12471 S 2208 49324 0.0 00:00:00 0.9 /apps/openmpi/1.4.3/bin/mpirun >>>> --machinefile m1 ./a2a143 120000 30 >>>> 12472 S 2116 49312 0.0 00:00:00 0.0 /apps/openmpi/1.4.3/bin/mpirun >>>> --machinefile m2 ./pp143 >>>> 12535 R 270160 565668 1.0 00:00:02 82.4 ./a2a143 120000 30 >>>> 12536 R 270032 565536 1.0 00:00:02 81.4 ./a2a143 120000 30 >>>> 12537 R 270012 565528 1.0 00:00:02 87.3 ./a2a143 120000 30 >>>> 12538 R 269992 565532 1.0 00:00:02 93.3 ./a2a143 120000 30 >>>> 12539 R 269980 565516 1.0 00:00:02 81.4 ./a2a143 120000 30 >>>> 12540 R 270008 565516 1.0 00:00:02 86.3 ./a2a143 120000 30 >>>> 12541 R 270008 565516 1.0 00:00:02 96.3 ./a2a143 120000 30 >>>> 12542 R 272064 567568 1.0 00:00:02 91.3 ./a2a143 120000 30 >>>> Node 1: v251: >>>> PID S RSS VSZ %MEM TIME %CPU COMMAND >>>> 10367 S 1872 40648 0.0 00:00:00 0.0 orted -mca ess env -mca >>>> orte_ess_jobid 1444413440 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 >>>> --hnp-uri "1444413440.0;tcp://10.1.3.58:37339" >>>> 10368 S 1868 40648 0.0 00:00:00 0.0 orted -mca ess env -mca >>>> orte_ess_jobid 1444347904 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 3 >>>> --hnp-uri "1444347904.0;tcp://10.1.3.58:39610" >>>> 10372 R 271112 567556 1.0 00:00:04 74.5 ./a2a143 120000 30 >>>> 10373 R 271036 567564 1.0 00:00:04 71.5 ./a2a143 120000 30 >>>> 10374 R 271032 567560 1.0 00:00:04 66.5 ./a2a143 120000 30 >>>> 10375 R 273112 569612 1.1 00:00:04 68.5 ./a2a143 120000 30 >>>> 10378 R 552280 840712 2.2 00:00:04 100 ./pp143 >>>> 10379 R 552280 840708 2.2 00:00:04 100 ./pp143 >>>> 10380 R 552328 841576 2.2 00:00:04 100 ./pp143 >>>> 10381 R 552788 841216 2.2 00:00:04 99.3 ./pp143 >>>> Node 2: v252: >>>> PID S RSS VSZ %MEM TIME %CPU COMMAND >>>> 10152 S 1908 40780 0.0 00:00:00 0.0 orted -mca ess env -mca >>>> orte_ess_jobid 1444347904 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 >>>> --hnp-uri "1444347904.0;tcp://10.1.3.58:39610" >>>> 10156 R 552384 840200 2.2 00:00:07 99.3 ./pp143 >>>> 10157 R 551868 839692 2.2 00:00:06 99.3 ./pp143 >>>> 10158 R 551400 839184 2.2 00:00:07 100 ./pp143 >>>> 10159 R 551436 839184 2.2 00:00:06 98.3 ./pp143 >>>> 10160 R 551760 839692 2.2 00:00:07 100 ./pp143 >>>> 10161 R 551788 839824 2.2 00:00:07 97.3 ./pp143 >>>> 10162 R 552256 840332 2.2 00:00:07 100 ./pp143 >>>> 10163 R 552216 840340 2.2 00:00:07 99.3 ./pp143 >>>> >>>> >>>> You would have to do something smarter to get correct process binding etc. >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> >> -- >> Laurence Marks >> Department of Materials Science and Engineering >> MSE Rm 2036 Cook Hall >> 2220 N Campus Drive >> Northwestern University >> Evanston, IL 60208, USA >> Tel: (847) 491-3996 Fax: (847) 491-7820 >> email: L-marks at northwestern dot edu >> Web: www.numis.northwestern.edu >> Chair, Commission on Electron Crystallography of IUCR >> www.numis.northwestern.edu/ >> Research is to see what everybody else has seen, and to think what >> nobody else has thought >> Albert Szent-Gyorgi >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/ Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Gyorgi