-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 06.02.2013 um 12:23 schrieb Duke Nguyen:
> On 2/6/13 1:03 AM, Gus Correa wrote: > > > > > On 02/05/2013 08:52 AM, Jeff Squyres (jsquyres) wrote: > > > >> To add to what Reuti said, if you enable PBS support in > Open MPI, when users "mpirun ..." in a PBS job, Open MPI will > automatically use the PBS native launching mechanism, which won't > let you run outside of the servers allocated to that job. > > > >> > > > >> Concrete example: if you qsub a job and are allocated > node A, B, and C, but then try to run with "mpirun --host D,E,F > ...", you'll get an error. > > > >> > > > >> That being said -- keep in mind what Reuti said: if users > are allowed to ssh between nodes that are not allocated to them, > then they can always bypass this behavior and use just Open MPI's > ssh support to launch on nodes D, E, F (etc.). > > > > > > >> On Feb 5, 2013, at 2:46 AM, > Reuti > <re...@staff.uni-marburg.de> wrote: >> Can the users use a plain ssh between the nodes? If they are forced to use >> the TM of Torque instead, it should be impossible to start a job on a >> non-granted machine. >> > Thanks Reuti and Jeff, you are right, users should not be allowed to ssh to > all nodes, which is how our cluster was set up: users can even password-less > ssh to any node. I know this is not appropriate question in OpenMPI forum, > but how can we setup so that user can only ssh (with password) to nodes that > are allocated to them at the time of qsub'ing? I am still new to all of this > cluster thing :) I even disallow this. Only admin staff is allowed to login to the nodes. This forces also the admin to look for a tight integration of the user's software into the queuing system. - -- Reuti > > > > > > Besides Reuti's and Jeff's suggestions. > > > > > > > > If you build your own Torque/PBS with PAM support > > > > (./configure --with-pam [other configure flags]), > > > > you can prevent users that are not running a Torque/PBS job > > > > on a node to launch processes in that node. > > > > > > > > See this: > > > > > > http://docs.adaptivecomputing.com/torque/4-1-3/help.htm#topics/1-installConfig/customizingTheInstall.htm > > > > > > > Of course you will need to rebuild your OpenMPI with Torque > > > > support again, after you install a version of Torque with PAM > > > > support. > > > > > > > > This is mostly a Torque/Maui issue, with a bit of an MPI > issue. > > > > You may get more help about this on the Torque and Maui > > > > mailing lists, and in their archives you may find more > specific > > > > guidance on what you need to add to the pam/security > > > > files to make it work. > > > > > Thanks for this! I will try this --with-pam when our test job finishes. > > > > > > Torque with PAM support is not 100% foolproof, > > > > because users that *are* running a Torque/PBS job on > > > > a node can still cheat and launch more processes there, > > > > but it helps restrict the problem to this case. > > > > > > > > Some sys admins also add a cleanup/sweep routine to the > > > > Torque epilogue script to kill any processes belonging to > > > > the user whose job just finished. > > > > However, this not very good because that user may have > another > > > > legitimate job still running there. > > > > Other cleanup strategies are possible, and you may find some > > > > suggestions and even scripts if you google around. > > > > > > > > Moreover, if you configure your scheduler (Maui?) to > > > > assign full nodes to jobs (no node sharing), > > > > the cheaters will be cheating on > > > > themselves, not stepping on other users' toes. > > > > Look for "JOBNODEMATCHPOLICY" here: > > > > > http://docs.adaptivecomputing.com/maui/a.fparameters.php > > > > > > > Assigning full nodes to jobs ("EXACTNODE") may or may not be > a > > > > good choice for you. > > > > E.g. you may consider it wasteful, if there are many serial > > > > jobs or parallel jobs running only on a few processors, in > > > > which case you may want to pack those jobs in the fewest > > > > nodes possible ("EXACTPROC"), so as to have a maximum > throughput. > > > > However, "no node sharing" helps preventing cheaters > > > > to bother other users that are running jobs on the same node, > > > > and it is not bad at all if most of the jobs are parallel > > > > and use many cores (say, >= number of cores per node). > > > > > Yes, we use Maui, but I let it at its' default settings since I have no idea > and experience about maui's options. I like EXACTPROC option. Our cluster is > small so that might be better for us. > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.18 (Darwin) Comment: GPGTools - http://gpgtools.org iEYEARECAAYFAlESVmoACgkQo/GbGkBRnRpSegCdFyRJFeWa8gbbzfFGIX+MNsu3 ZLIAn1V2OYH+yZ/2lHE0g7MVdHus1bWi =HfkP -----END PGP SIGNATURE-----