Re: [OMPI users] open mpi on blue waters

Ralph Castain Thu, 26 Mar 2015 00:00:46 -0400 (EDT)

Well, I did some digging around, and this PR looks like the right solution.


First, the security issue is fine so long as we use the highest level of 
security that is available. If someone configures the system with munge, then 
we default to it - if not, we use the next highest one available.

Second, the running of munge on the IO nodes is not only okay but required by 
Luster. Future systems are increasingly going to run the user’s job script 
(including mpirun) on the IO nodes as this (a) frees up the login node for 
interactive editing, and (b) avoids the jitter introduced by running the job 
script on the same node as application procs, or wasting a compute node to just 
run the job script.

People also opt not to run munge on the compute nodes to likewise avoid the 
introduced jitter. Since many systems consider their users to be “trusted” once 
they are given an allocation, the need for authenticating every connection 
isn’t considered mandatory. We’ll probably also see an increasing use of 
“lightweight kernels” on the compute nodes, and so use of things like munge 
back there may become even less common.

So it sounds like we are going to run into a number of these “mixed mode” 
setups. I’ll apply the PR. I’ve also thought of a way to resolve the reverse 
problem (where the connection initiator is in the higher security zone), but 
I’ll do that one tomorrow.

HTH
Ralph

> On Mar 25, 2015, at 7:24 PM, Ralph Castain <r...@open-mpi.org> wrote:
> 
> I’ve asked Mark to check with the sys admins as to the logic behind their 
> configuration. I would not immediately presume that they are doing something 
> wrong or that munge is not needed - could be used for other purposes.
> 
> I fully recognize that this change doesn’t resolve all problems, but it will 
> resolve at least some of them. Barring a two-sided handshake, there isn’t 
> much else that can be done. I’m not convinced that it represents a security 
> risk, but I am having it reviewed.
> 
> Improving the error message regardless is a good idea and I’ll follow up on 
> it.
> 
>> On Mar 25, 2015, at 7:11 PM, Gilles Gouaillardet 
>> <gilles.gouaillar...@iferc.org> wrote:
>> 
>> Mark,
>> 
>> munge is an authentication mechanism based
>> on a secret key shared between hosts.
>> 
>> there are both a daemon part and a library/client part.
>> 
>> it its simplest form, you can run on node0 :
>> 
>> echo "hello" | munge | ssh node1 unmunge
>> (see sample output below)
>> 
>> if everything is correctly set (e.g. same shared secret key and munge
>> daemons are running)
>> then node1 will know for sure (well, at least as long as the secret key
>> stays secret ...)
>> the "hello" message was sent from node0 by this user, at that time and
>> was not altered.
>> there is also a time-to-live (ttl) for each message and an anti replay
>> mechanism
>> 
>> for example, munge can be used by SLURM in order to authentificate
>> messages between compute and head nodes.
>> 
>> 
>> so at first glance, it makes little sense to have munge running on some
>> nodes and not on others.
>> but since blue waters is running torque (at least this is what i found
>> on google), munge could be not needed
>> (and in this case, it is probably best to have it disabled on all nodes)
>> or it could be used by other services.
>> 
>> 
>> Ralph,
>> 
>> the commit log of PR #497 is :
>> "Let the initiator of the connection determine the method to be used -
>> if the receiver cannot support it, then that's an error that will cause
>> the connection attempt to fail."
>> 
>> from a security point of view, and imho, that seems wrong to me.
>> (it's like entering a bank and asking "i have no credit card, no id but
>> i ask you to trust me and give me my money").
>> 
>> also, that does not handle all cases :
>> /* if munge is not running on some nodes, and those nodes happen *not*
>> to be the initiator, that will break,
>> a trivial (and dumb) corner case is if mpirun runs on the compute node,
>> and the head node is part of the node list ... */
>> 
>> back to ompi, i d rather have the initiator send its authentication
>> method with its authentication key, and have the server
>> check it is using the same authentication method, and fail with an user
>> friendly error message otherwise.
>> (authentication type mismatch : node0 uses munge but node1 uses basic,
>> you should contact your sysadmin
>> or if you know what you are doing, you can try mpirun -mca sec basic)
>> 
>> on blue waters, that would mean ompi does not run out of the box, but
>> fails with an understandable message.
>> that would be less user friendly, but more secure
>> 
>> any thoughts ?
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> 
>> 
>> 
>> [gouaillardet@node0 ~]$ echo coucou | munge | ssh node1 unmunge
>> STATUS:           Success (0)
>> ENCODE_HOST:      node0 (10.7.67.3)
>> ENCODE_TIME:      2015-03-26 09:55:16 (1427331316)
>> DECODE_TIME:      2015-03-26 09:55:16 (1427331316)
>> TTL:              300
>> CIPHER:           aes128 (4)
>> MAC:              sha1 (3)
>> ZIP:              none (0)
>> UID:              gouaillardet (1011)
>> GID:              gouaillardet (1011)
>> LENGTH:           7
>> 
>> coucou
>> 
>> On 2015/03/26 5:59, Mark Santcroos wrote:
>>> Hi Ralph,
>>> 
>>>> On 25 Mar 2015, at 21:25 , Ralph Castain <r...@open-mpi.org> wrote:
>>>> I think I have this resolved,
>>>> though that I still suspect their is something wrong on that system. You 
>>>> shouldn’t have some nodes running munge and others not running it.
>>> For completeness, it's not "some" nodes, its the MOM (service) nodes that 
>>> run it, and the compute nodes don't.
>>> I don't know munge well enough to judge whether it makes sense to have it 
>>> there only and not on the compute nodes?
>>> 
>>>> I wonder if someone was experimenting and started munge on some of the 
>>>> nodes, and forgot to turn it off afterwards??
>>> If the answer to my request for clarification is along the lines of "No!", 
>>> then I can ask the admins whats up.
>>> 
>>>> Anyway, see if this fixes the problem.
>>>> 
>>>> https://github.com/open-mpi/ompi/pull/497
>>> Will get back to you later how that works for me.
>>> 
>>> Thanks
>>> 
>>> Mark
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/03/26533.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/03/26535.php
>

Re: [OMPI users] open mpi on blue waters

Reply via email to