-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 06.02.2013 um 12:23 schrieb Duke Nguyen:

> On 2/6/13 1:03 AM, Gus Correa wrote:
> >
> 
>       > On 02/05/2013 08:52 AM, Jeff Squyres (jsquyres) wrote:
> 
> 
>       >> To add to what Reuti said, if you enable PBS support in
>       Open MPI, when users "mpirun ..." in a PBS job, Open MPI will
>       automatically use the PBS native launching mechanism, which won't
>       let you run outside of the servers allocated to that job.
> 
> 
>       >>
> 
> 
>       >> Concrete example: if you qsub a job and are allocated
>       node A, B, and C, but then try to run with "mpirun --host D,E,F
>       ...", you'll get an error.
> 
> 
>       >>
> 
> 
>       >> That being said -- keep in mind what Reuti said: if users
>       are allowed to ssh between nodes that are not allocated to them,
>       then they can always bypass this behavior and use just Open MPI's
>       ssh support to launch on nodes D, E, F (etc.).
> 
> 
>       
> 
> 
>       >> On Feb 5, 2013, at 2:46 AM,
>       Reuti
> <re...@staff.uni-marburg.de> wrote:
>> Can the users use a plain ssh between the nodes? If they are forced to use 
>> the TM of Torque instead, it should be impossible to start a job on a 
>> non-granted machine.
>> 
> Thanks Reuti and Jeff, you are right, users should not be allowed to ssh to 
> all nodes, which is how our cluster was set up: users can even password-less 
> ssh to any node. I know this is not appropriate question in OpenMPI forum, 
> but how can we setup so that user can only ssh (with password) to nodes that 
> are allocated to them at the time of qsub'ing? I am still new to all of this 
> cluster thing :)

I even disallow this. Only admin staff is allowed to login to the nodes. This 
forces also the admin to look for a tight integration of the user's software 
into the queuing system.

- -- Reuti


> 
> >
> 
>       > Besides Reuti's and Jeff's suggestions.
> 
> 
>       >
> 
> 
>       > If you build your own Torque/PBS with PAM support
> 
> 
>       > (./configure --with-pam [other configure flags]),
> 
> 
>       > you can prevent users that are not running a Torque/PBS job
> 
> 
>       > on a node to launch processes in that node.
> 
> 
>       >
> 
> 
>       > See this:
> 
> 
>       >
> 
> http://docs.adaptivecomputing.com/torque/4-1-3/help.htm#topics/1-installConfig/customizingTheInstall.htm
> 
>       >
> 
> 
>       > Of course you will need to rebuild your OpenMPI with Torque
> 
> 
>       > support again, after you install a version of Torque with PAM
> 
> 
>       > support.
> 
> 
>       >
> 
> 
>       > This is mostly a Torque/Maui issue, with a bit of an MPI
>       issue.
> 
> 
>       > You may get more help about this on the Torque and Maui
> 
> 
>       > mailing lists, and in their archives you may find more
>       specific
> 
> 
>       > guidance on what you need to add to the pam/security
> 
> 
>       > files to make it work.
> 
> 
>     
> 
> Thanks for this! I will try this --with-pam when our test job finishes. 
> 
> >
> 
>       > Torque with PAM support is not 100% foolproof,
> 
> 
>       > because users that *are* running a Torque/PBS job on
> 
> 
>       > a node can still cheat and launch more processes there,
> 
> 
>       > but it helps restrict the problem to this case.
> 
> 
>       >
> 
> 
>       > Some sys admins also add a cleanup/sweep routine to the
> 
> 
>       > Torque epilogue script to kill any processes belonging to
> 
> 
>       > the user whose job just finished.
> 
> 
>       > However, this not very good because that user may have
>       another
> 
> 
>       > legitimate job still running there.
> 
> 
>       > Other cleanup strategies are possible, and you may find some
> 
> 
>       > suggestions and even scripts if you google around.
> 
> 
>       >
> 
> 
>       > Moreover, if you configure your scheduler (Maui?) to
> 
> 
>       > assign full nodes to jobs (no node sharing),
> 
> 
>       > the cheaters will be cheating on
> 
> 
>       > themselves, not stepping on other users' toes.
> 
> 
>       > Look for "JOBNODEMATCHPOLICY" here:
> 
> 
>       > 
> http://docs.adaptivecomputing.com/maui/a.fparameters.php
> 
>       >
> 
> 
>       > Assigning full nodes to jobs ("EXACTNODE") may or may not be
>       a
> 
> 
>       > good choice for you.
> 
> 
>       > E.g. you may consider it wasteful, if there are many serial
> 
> 
>       > jobs or parallel jobs running only on a few processors, in
> 
> 
>       > which case you may want to pack those jobs in the fewest
> 
> 
>       > nodes possible ("EXACTPROC"), so as to have a maximum
>       throughput.
> 
> 
>       > However, "no node sharing" helps preventing cheaters
> 
> 
>       > to bother other users that are running jobs on the same node,
> 
> 
>       > and it is not bad at all if most of the jobs are parallel
> 
> 
>       > and use many cores (say, >= number of cores per node).
> 
> 
>     
> 
> Yes, we use Maui, but I let it at its' default settings since I have no idea 
> and experience about maui's options. I like EXACTPROC option. Our cluster is 
> small so that might be better for us.
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.18 (Darwin)
Comment: GPGTools - http://gpgtools.org

iEYEARECAAYFAlESVmoACgkQo/GbGkBRnRpSegCdFyRJFeWa8gbbzfFGIX+MNsu3
ZLIAn1V2OYH+yZ/2lHE0g7MVdHus1bWi
=HfkP
-----END PGP SIGNATURE-----

Reply via email to