On Sun, Apr 3, 2011 at 5:08 PM, Reuti <re...@staff.uni-marburg.de> wrote: > Am 03.04.2011 um 23:59 schrieb David Singleton: > >> On 04/04/2011 12:56 AM, Ralph Castain wrote: >>> >>> What I still don't understand is why you are trying to do it this way. Why >>> not just run >>> >>> time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN >>> /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def >>> >>> where machineN contains the names of the nodes where you want the MPI apps >>> to execute? mpirun will only execute apps on those nodes, so this >>> accomplishes the same thing as your script - only with a lot less pain. >>> >>> Your script would just contain a sequence of these commands, each with its >>> number of procs and machinefile as required. >>> >> >> Maybe I missed why this suggestion of forgetting about the ssh/pbsdsh >> altogether >> was not feasible? Just use mpirun (with its great tm support!) to distribute >> MPI jobs. > > Wien2k has a two stage startup, e.g. for 16 slots: > > a) start 4 times `ssh` in the background to go to some of the granted nodes > b) use there on each node `mpirun` to start processes on the remaining nodes, > 3 for each call > > Problems: > > 1) control `ssh` under Torque > 2) provide a partially hostlist to `mpirun`, maybe by disabling the default > tight integration > > -- Reuti >
1) The mpi tasks can be started on only one node (Reuti, "setenv MPI_REMOTE 0" in parallel_options which was introduced for other reasons in 9.3 and later releases). That seems to be safe and maybe the only viable method with OMPI as pbsdsh appears to be unable to launch mpi tasks correctly (or needs some environmental variables that I don't know about). 2) This is already done (Reuti, this is .machine0, .machine1 etc. If you need information about setting up the Wien2k file under qsub in general, contact me offline or look for Machines2W on the mailing list which may be part of the next release, I'm not sure and I don't make those decisions). However, there is another layer that Ruedi did not mention for this code which is that some processes also need to be remotely launched to ensure that the correct scratch directories are used (i.e. local storage which is faster rather than nfs or similar). Maybe pbsdsh can be used for this, I am still testing and I am not sure. It may be enough to create a script with all important environmental variables exported (as they may not all be in .bashrc or .cshrc) although there might be issues making this fully portable. Since there are > 1000 licenses of Wien2k, it has to be able to cope with different OS's, and not just OMPI. > >> A simple example: >> >> vayu1:~/MPI > qsub -lncpus=24,vmem=24gb,walltime=10:00 -wd -I >> qsub: waiting for job 574900.vu-pbs to start >> qsub: job 574900.vu-pbs ready >> >> [dbs900@v250 ~/MPI]$ wc -l $PBS_NODEFILE >> 24 >> [dbs900@v250 ~/MPI]$ head -12 $PBS_NODEFILE > m1 >> [dbs900@v250 ~/MPI]$ tail -12 $PBS_NODEFILE > m2 >> [dbs900@v250 ~/MPI]$ mpirun --machinefile m1 ./a2a143 120000 30 & mpirun >> --machinefile m2 ./pp143 >> >> >> Check how the processes are distributed ... >> >> vayu1:~ > qps 574900.vu-pbs >> Node 0: v250: >> PID S RSS VSZ %MEM TIME %CPU COMMAND >> 11420 S 2104 10396 0.0 00:00:00 0.0 -tcsh >> 11421 S 620 10552 0.0 00:00:00 0.0 pbs_demux >> 12471 S 2208 49324 0.0 00:00:00 0.9 /apps/openmpi/1.4.3/bin/mpirun >> --machinefile m1 ./a2a143 120000 30 >> 12472 S 2116 49312 0.0 00:00:00 0.0 /apps/openmpi/1.4.3/bin/mpirun >> --machinefile m2 ./pp143 >> 12535 R 270160 565668 1.0 00:00:02 82.4 ./a2a143 120000 30 >> 12536 R 270032 565536 1.0 00:00:02 81.4 ./a2a143 120000 30 >> 12537 R 270012 565528 1.0 00:00:02 87.3 ./a2a143 120000 30 >> 12538 R 269992 565532 1.0 00:00:02 93.3 ./a2a143 120000 30 >> 12539 R 269980 565516 1.0 00:00:02 81.4 ./a2a143 120000 30 >> 12540 R 270008 565516 1.0 00:00:02 86.3 ./a2a143 120000 30 >> 12541 R 270008 565516 1.0 00:00:02 96.3 ./a2a143 120000 30 >> 12542 R 272064 567568 1.0 00:00:02 91.3 ./a2a143 120000 30 >> Node 1: v251: >> PID S RSS VSZ %MEM TIME %CPU COMMAND >> 10367 S 1872 40648 0.0 00:00:00 0.0 orted -mca ess env -mca >> orte_ess_jobid 1444413440 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 >> --hnp-uri "1444413440.0;tcp://10.1.3.58:37339" >> 10368 S 1868 40648 0.0 00:00:00 0.0 orted -mca ess env -mca >> orte_ess_jobid 1444347904 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 3 >> --hnp-uri "1444347904.0;tcp://10.1.3.58:39610" >> 10372 R 271112 567556 1.0 00:00:04 74.5 ./a2a143 120000 30 >> 10373 R 271036 567564 1.0 00:00:04 71.5 ./a2a143 120000 30 >> 10374 R 271032 567560 1.0 00:00:04 66.5 ./a2a143 120000 30 >> 10375 R 273112 569612 1.1 00:00:04 68.5 ./a2a143 120000 30 >> 10378 R 552280 840712 2.2 00:00:04 100 ./pp143 >> 10379 R 552280 840708 2.2 00:00:04 100 ./pp143 >> 10380 R 552328 841576 2.2 00:00:04 100 ./pp143 >> 10381 R 552788 841216 2.2 00:00:04 99.3 ./pp143 >> Node 2: v252: >> PID S RSS VSZ %MEM TIME %CPU COMMAND >> 10152 S 1908 40780 0.0 00:00:00 0.0 orted -mca ess env -mca >> orte_ess_jobid 1444347904 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 >> --hnp-uri "1444347904.0;tcp://10.1.3.58:39610" >> 10156 R 552384 840200 2.2 00:00:07 99.3 ./pp143 >> 10157 R 551868 839692 2.2 00:00:06 99.3 ./pp143 >> 10158 R 551400 839184 2.2 00:00:07 100 ./pp143 >> 10159 R 551436 839184 2.2 00:00:06 98.3 ./pp143 >> 10160 R 551760 839692 2.2 00:00:07 100 ./pp143 >> 10161 R 551788 839824 2.2 00:00:07 97.3 ./pp143 >> 10162 R 552256 840332 2.2 00:00:07 100 ./pp143 >> 10163 R 552216 840340 2.2 00:00:07 99.3 ./pp143 >> >> >> You would have to do something smarter to get correct process binding etc. >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/ Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Gyorgi