Hmmm...yes, I guess we did get off-track then. This soln is exactly what I proposed on the first response to your thread, and was repeated by others later on. :-/
So long as mpirun is executed on the node where the "sister mom" is located, and as long as your script "B" does -not- include an "mpirun" cmd, this will work fine. On Apr 4, 2011, at 8:38 AM, Laurence Marks wrote: > Thanks, I think we may have a mistaken communication here; I assume > that the computer where they have disabled rsh and ssh they have > "something" to communicate with so we don't need to use pbsdsh. If > they don't there is not much a lowly user like me can do. > > I think we can close this, since like many things the answer is > "simple" when you find it and I think I have. Forget pbsdsh which > seems to be a bit flakey and probably is not being maintained much. > Instead, use mpirun to replace ssh. In other words replace > > ssh A B > > to execute command B on node A by > > mpirun -np 1 --host A bash -c " B " > > (with variables appropriately substituted, or with csh instead of > bash). Then -x (in OMPI) can be used to export whatever is needed in > the environment etc, which pbsdsh lacks, and there should be similarly > environment exporting with other MPI. With whatever minor changes are > needed for other flavors of MPI I believe this should be 99% robust > and portable. This passes the simple test with B of "sleep 600" when > terminating the process where the mpirun is launched kills the sleep > on a remote node (unlike ssh on some but not all computers). > > On Mon, Apr 4, 2011 at 6:35 AM, Ralph Castain <r...@open-mpi.org> wrote: >> I apologize - I realized late last night that I had a typo in my recommended >> command. It should read: >> >> mpirun -mca plm rsh -mca plm_rsh_agent pbsdsh -mca ras ^tm --machinefile >> m1.... >> ^^^^^^^^^^^^^^^^^^^ >> >> Also, if you know that #procs <= #cores on your nodes, you can greatly >> improve performance by adding "--bind-to-core". >> >> >> >> On Apr 3, 2011, at 5:28 PM, Laurence Marks wrote: >> >>> And, before someone wonders, while Wien2k is a commercial code it is >>> about 500 Eu for a lifetime licence so this is not the same as Vasp or >>> Gaussian which cost $$$$$. And, I have no financial interest in the >>> code, but like many others help make it better (semi gnu). >>> >>> On Sun, Apr 3, 2011 at 6:25 PM, Laurence Marks <l-ma...@northwestern.edu> >>> wrote: >>>> Thanks. I will test this tomorrow. >>>> >>>> Many people run Wien2k with openmpi as you say, I only became aware of >>>> the issue of Wien2k (and perhaps other codes) leaving orphaned >>>> processes still running a few days ago. I also know someone who wants >>>> to run Wien2k on a system where both rsh and ssh are banned. >>>> Personally, as I don't want to be banned from the supercomputers I use >>>> I want to find a adequate patch for myself --- and then try and >>>> persuade the developers to adopt it. >>>> >>>> On Sun, Apr 3, 2011 at 6:13 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> >>>>> On Apr 3, 2011, at 4:37 PM, Laurence Marks wrote: >>>>> >>>>>> On Sun, Apr 3, 2011 at 5:08 PM, Reuti <re...@staff.uni-marburg.de> wrote: >>>>>>> Am 03.04.2011 um 23:59 schrieb David Singleton: >>>>>>> >>>>>>>> On 04/04/2011 12:56 AM, Ralph Castain wrote: >>>>>>>>> >>>>>>>>> What I still don't understand is why you are trying to do it this >>>>>>>>> way. Why not just run >>>>>>>>> >>>>>>>>> time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile >>>>>>>>> .machineN /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def >>>>>>>>> >>>>>>>>> where machineN contains the names of the nodes where you want the MPI >>>>>>>>> apps to execute? mpirun will only execute apps on those nodes, so >>>>>>>>> this accomplishes the same thing as your script - only with a lot >>>>>>>>> less pain. >>>>>>>>> >>>>>>>>> Your script would just contain a sequence of these commands, each >>>>>>>>> with its number of procs and machinefile as required. >>>>>>>>> >>>>>>>> >>>>>>>> Maybe I missed why this suggestion of forgetting about the ssh/pbsdsh >>>>>>>> altogether >>>>>>>> was not feasible? Just use mpirun (with its great tm support!) to >>>>>>>> distribute >>>>>>>> MPI jobs. >>>>>>> >>>>>>> Wien2k has a two stage startup, e.g. for 16 slots: >>>>>>> >>>>>>> a) start 4 times `ssh` in the background to go to some of the granted >>>>>>> nodes >>>>>>> b) use there on each node `mpirun` to start processes on the remaining >>>>>>> nodes, 3 for each call >>>>>>> >>>>>>> Problems: >>>>>>> >>>>>>> 1) control `ssh` under Torque >>>>>>> 2) provide a partially hostlist to `mpirun`, maybe by disabling the >>>>>>> default tight integration >>>>>>> >>>>>>> -- Reuti >>>>>>> >>>>>> >>>>>> 1) The mpi tasks can be started on only one node (Reuti, "setenv >>>>>> MPI_REMOTE 0" in parallel_options which was introduced for other >>>>>> reasons in 9.3 and later releases). That seems to be safe and maybe >>>>>> the only viable method with OMPI as pbsdsh appears to be unable to >>>>>> launch mpi tasks correctly (or needs some environmental variables that >>>>>> I don't know about). >>>>>> 2) This is already done (Reuti, this is .machine0, .machine1 etc. If >>>>>> you need information about setting up the Wien2k file under qsub in >>>>>> general, contact me offline or look for Machines2W on the mailing list >>>>>> which may be part of the next release, I'm not sure and I don't make >>>>>> those decisions). >>>>>> >>>>>> However, there is another layer that Ruedi did not mention for this >>>>>> code which is that some processes also need to be remotely launched to >>>>>> ensure that the correct scratch directories are used (i.e. local >>>>>> storage which is faster rather than nfs or similar). Maybe pbsdsh can >>>>>> be used for this, I am still testing and I am not sure. It may be >>>>>> enough to create a script with all important environmental variables >>>>>> exported (as they may not all be in .bashrc or .cshrc) although there >>>>>> might be issues making this fully portable. Since there are > 1000 >>>>>> licenses of Wien2k, it has to be able to cope with different OS's, and >>>>>> not just OMPI. >>>>>> >>>>> >>>>> Here is what I would do, based on my knowledge of OMPI's internals (and I >>>>> wrote the launchers :-)): >>>>> >>>>> 1. do not use your script - you don't want all those PBS envars to >>>>> confuse OMPI >>>>> >>>>> 2. mpirun -mca plm rsh -launch-agent pbsdsh -mca ras ^tm --machinefile >>>>> m1.... >>>>> >>>>> This cmd line tells mpirun to use the "rsh/ssh" launcher, but to >>>>> substitute "pbsdsh" for "ssh". It also tells it to ignore the >>>>> PBS_NODEFILE and just use the machinefile for the nodes to be used for >>>>> that job. >>>>> >>>>> I can't swear this will work as I have never verified that pbsdsh and ssh >>>>> have the same syntax, but I -think- that was true. If so, then this might >>>>> do what you are attempting. >>>>> >>>>> >>>>> I know people have run Wien2k with OMPI before - but I have never heard >>>>> of the problems you are reporting. >>>>> >>>>> >>>>>>> >>>>>>>> A simple example: >>>>>>>> >>>>>>>> vayu1:~/MPI > qsub -lncpus=24,vmem=24gb,walltime=10:00 -wd -I >>>>>>>> qsub: waiting for job 574900.vu-pbs to start >>>>>>>> qsub: job 574900.vu-pbs ready >>>>>>>> >>>>>>>> [dbs900@v250 ~/MPI]$ wc -l $PBS_NODEFILE >>>>>>>> 24 >>>>>>>> [dbs900@v250 ~/MPI]$ head -12 $PBS_NODEFILE > m1 >>>>>>>> [dbs900@v250 ~/MPI]$ tail -12 $PBS_NODEFILE > m2 >>>>>>>> [dbs900@v250 ~/MPI]$ mpirun --machinefile m1 ./a2a143 120000 30 & >>>>>>>> mpirun --machinefile m2 ./pp143 >>>>>>>> >>>>>>>> >>>>>>>> Check how the processes are distributed ... >>>>>>>> >>>>>>>> vayu1:~ > qps 574900.vu-pbs >>>>>>>> Node 0: v250: >>>>>>>> PID S RSS VSZ %MEM TIME %CPU COMMAND >>>>>>>> 11420 S 2104 10396 0.0 00:00:00 0.0 -tcsh >>>>>>>> 11421 S 620 10552 0.0 00:00:00 0.0 pbs_demux >>>>>>>> 12471 S 2208 49324 0.0 00:00:00 0.9 /apps/openmpi/1.4.3/bin/mpirun >>>>>>>> --machinefile m1 ./a2a143 120000 30 >>>>>>>> 12472 S 2116 49312 0.0 00:00:00 0.0 /apps/openmpi/1.4.3/bin/mpirun >>>>>>>> --machinefile m2 ./pp143 >>>>>>>> 12535 R 270160 565668 1.0 00:00:02 82.4 ./a2a143 120000 30 >>>>>>>> 12536 R 270032 565536 1.0 00:00:02 81.4 ./a2a143 120000 30 >>>>>>>> 12537 R 270012 565528 1.0 00:00:02 87.3 ./a2a143 120000 30 >>>>>>>> 12538 R 269992 565532 1.0 00:00:02 93.3 ./a2a143 120000 30 >>>>>>>> 12539 R 269980 565516 1.0 00:00:02 81.4 ./a2a143 120000 30 >>>>>>>> 12540 R 270008 565516 1.0 00:00:02 86.3 ./a2a143 120000 30 >>>>>>>> 12541 R 270008 565516 1.0 00:00:02 96.3 ./a2a143 120000 30 >>>>>>>> 12542 R 272064 567568 1.0 00:00:02 91.3 ./a2a143 120000 30 >>>>>>>> Node 1: v251: >>>>>>>> PID S RSS VSZ %MEM TIME %CPU COMMAND >>>>>>>> 10367 S 1872 40648 0.0 00:00:00 0.0 orted -mca ess env -mca >>>>>>>> orte_ess_jobid 1444413440 -mca orte_ess_vpid 1 -mca orte_ess_num_procs >>>>>>>> 2 --hnp-uri "1444413440.0;tcp://10.1.3.58:37339" >>>>>>>> 10368 S 1868 40648 0.0 00:00:00 0.0 orted -mca ess env -mca >>>>>>>> orte_ess_jobid 1444347904 -mca orte_ess_vpid 1 -mca orte_ess_num_procs >>>>>>>> 3 --hnp-uri "1444347904.0;tcp://10.1.3.58:39610" >>>>>>>> 10372 R 271112 567556 1.0 00:00:04 74.5 ./a2a143 120000 30 >>>>>>>> 10373 R 271036 567564 1.0 00:00:04 71.5 ./a2a143 120000 30 >>>>>>>> 10374 R 271032 567560 1.0 00:00:04 66.5 ./a2a143 120000 30 >>>>>>>> 10375 R 273112 569612 1.1 00:00:04 68.5 ./a2a143 120000 30 >>>>>>>> 10378 R 552280 840712 2.2 00:00:04 100 ./pp143 >>>>>>>> 10379 R 552280 840708 2.2 00:00:04 100 ./pp143 >>>>>>>> 10380 R 552328 841576 2.2 00:00:04 100 ./pp143 >>>>>>>> 10381 R 552788 841216 2.2 00:00:04 99.3 ./pp143 >>>>>>>> Node 2: v252: >>>>>>>> PID S RSS VSZ %MEM TIME %CPU COMMAND >>>>>>>> 10152 S 1908 40780 0.0 00:00:00 0.0 orted -mca ess env -mca >>>>>>>> orte_ess_jobid 1444347904 -mca orte_ess_vpid 2 -mca orte_ess_num_procs >>>>>>>> 3 --hnp-uri "1444347904.0;tcp://10.1.3.58:39610" >>>>>>>> 10156 R 552384 840200 2.2 00:00:07 99.3 ./pp143 >>>>>>>> 10157 R 551868 839692 2.2 00:00:06 99.3 ./pp143 >>>>>>>> 10158 R 551400 839184 2.2 00:00:07 100 ./pp143 >>>>>>>> 10159 R 551436 839184 2.2 00:00:06 98.3 ./pp143 >>>>>>>> 10160 R 551760 839692 2.2 00:00:07 100 ./pp143 >>>>>>>> 10161 R 551788 839824 2.2 00:00:07 97.3 ./pp143 >>>>>>>> 10162 R 552256 840332 2.2 00:00:07 100 ./pp143 >>>>>>>> 10163 R 552216 840340 2.2 00:00:07 99.3 ./pp143 >>>>>>>> >>>>>>>> >>>>>>>> You would have to do something smarter to get correct process binding >>>>>>>> etc. >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Laurence Marks >>>>>> Department of Materials Science and Engineering >>>>>> MSE Rm 2036 Cook Hall >>>>>> 2220 N Campus Drive >>>>>> Northwestern University >>>>>> Evanston, IL 60208, USA >>>>>> Tel: (847) 491-3996 Fax: (847) 491-7820 >>>>>> email: L-marks at northwestern dot edu >>>>>> Web: www.numis.northwestern.edu >>>>>> Chair, Commission on Electron Crystallography of IUCR >>>>>> www.numis.northwestern.edu/ >>>>>> Research is to see what everybody else has seen, and to think what >>>>>> nobody else has thought >>>>>> Albert Szent-Gyorgi >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>>> >>>> >>>> -- >>>> Laurence Marks >>>> Department of Materials Science and Engineering >>>> MSE Rm 2036 Cook Hall >>>> 2220 N Campus Drive >>>> Northwestern University >>>> Evanston, IL 60208, USA >>>> Tel: (847) 491-3996 Fax: (847) 491-7820 >>>> email: L-marks at northwestern dot edu >>>> Web: www.numis.northwestern.edu >>>> Chair, Commission on Electron Crystallography of IUCR >>>> www.numis.northwestern.edu/ >>>> Research is to see what everybody else has seen, and to think what >>>> nobody else has thought >>>> Albert Szent-Gyorgi >>>> >>> >>> >>> >>> -- >>> Laurence Marks >>> Department of Materials Science and Engineering >>> MSE Rm 2036 Cook Hall >>> 2220 N Campus Drive >>> Northwestern University >>> Evanston, IL 60208, USA >>> Tel: (847) 491-3996 Fax: (847) 491-7820 >>> email: L-marks at northwestern dot edu >>> Web: www.numis.northwestern.edu >>> Chair, Commission on Electron Crystallography of IUCR >>> www.numis.northwestern.edu/ >>> Research is to see what everybody else has seen, and to think what >>> nobody else has thought >>> Albert Szent-Gyorgi >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > -- > Laurence Marks > Department of Materials Science and Engineering > MSE Rm 2036 Cook Hall > 2220 N Campus Drive > Northwestern University > Evanston, IL 60208, USA > Tel: (847) 491-3996 Fax: (847) 491-7820 > email: L-marks at northwestern dot edu > Web: www.numis.northwestern.edu > Chair, Commission on Electron Crystallography of IUCR > www.numis.northwestern.edu/ > Research is to see what everybody else has seen, and to think what > nobody else has thought > Albert Szent-Gyorgi > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users