I’ve asked Mark to check with the sys admins as to the logic behind their 
configuration. I would not immediately presume that they are doing something 
wrong or that munge is not needed - could be used for other purposes.

I fully recognize that this change doesn’t resolve all problems, but it will 
resolve at least some of them. Barring a two-sided handshake, there isn’t much 
else that can be done. I’m not convinced that it represents a security risk, 
but I am having it reviewed.

Improving the error message regardless is a good idea and I’ll follow up on it.

> On Mar 25, 2015, at 7:11 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
> 
> Mark,
> 
> munge is an authentication mechanism based
> on a secret key shared between hosts.
> 
> there are both a daemon part and a library/client part.
> 
> it its simplest form, you can run on node0 :
> 
> echo "hello" | munge | ssh node1 unmunge
> (see sample output below)
> 
> if everything is correctly set (e.g. same shared secret key and munge
> daemons are running)
> then node1 will know for sure (well, at least as long as the secret key
> stays secret ...)
> the "hello" message was sent from node0 by this user, at that time and
> was not altered.
> there is also a time-to-live (ttl) for each message and an anti replay
> mechanism
> 
> for example, munge can be used by SLURM in order to authentificate
> messages between compute and head nodes.
> 
> 
> so at first glance, it makes little sense to have munge running on some
> nodes and not on others.
> but since blue waters is running torque (at least this is what i found
> on google), munge could be not needed
> (and in this case, it is probably best to have it disabled on all nodes)
> or it could be used by other services.
> 
> 
> Ralph,
> 
> the commit log of PR #497 is :
> "Let the initiator of the connection determine the method to be used -
> if the receiver cannot support it, then that's an error that will cause
> the connection attempt to fail."
> 
> from a security point of view, and imho, that seems wrong to me.
> (it's like entering a bank and asking "i have no credit card, no id but
> i ask you to trust me and give me my money").
> 
> also, that does not handle all cases :
> /* if munge is not running on some nodes, and those nodes happen *not*
> to be the initiator, that will break,
> a trivial (and dumb) corner case is if mpirun runs on the compute node,
> and the head node is part of the node list ... */
> 
> back to ompi, i d rather have the initiator send its authentication
> method with its authentication key, and have the server
> check it is using the same authentication method, and fail with an user
> friendly error message otherwise.
> (authentication type mismatch : node0 uses munge but node1 uses basic,
> you should contact your sysadmin
> or if you know what you are doing, you can try mpirun -mca sec basic)
> 
> on blue waters, that would mean ompi does not run out of the box, but
> fails with an understandable message.
> that would be less user friendly, but more secure
> 
> any thoughts ?
> 
> Cheers,
> 
> Gilles
> 
> 
> 
> 
> [gouaillardet@node0 ~]$ echo coucou | munge | ssh node1 unmunge
> STATUS:           Success (0)
> ENCODE_HOST:      node0 (10.7.67.3)
> ENCODE_TIME:      2015-03-26 09:55:16 (1427331316)
> DECODE_TIME:      2015-03-26 09:55:16 (1427331316)
> TTL:              300
> CIPHER:           aes128 (4)
> MAC:              sha1 (3)
> ZIP:              none (0)
> UID:              gouaillardet (1011)
> GID:              gouaillardet (1011)
> LENGTH:           7
> 
> coucou
> 
> On 2015/03/26 5:59, Mark Santcroos wrote:
>> Hi Ralph,
>> 
>>> On 25 Mar 2015, at 21:25 , Ralph Castain <r...@open-mpi.org> wrote:
>>> I think I have this resolved,
>>> though that I still suspect their is something wrong on that system. You 
>>> shouldn’t have some nodes running munge and others not running it.
>> For completeness, it's not "some" nodes, its the MOM (service) nodes that 
>> run it, and the compute nodes don't.
>> I don't know munge well enough to judge whether it makes sense to have it 
>> there only and not on the compute nodes?
>> 
>>> I wonder if someone was experimenting and started munge on some of the 
>>> nodes, and forgot to turn it off afterwards??
>> If the answer to my request for clarification is along the lines of "No!", 
>> then I can ask the admins whats up.
>> 
>>> Anyway, see if this fixes the problem.
>>> 
>>> https://github.com/open-mpi/ompi/pull/497
>> Will get back to you later how that works for me.
>> 
>> Thanks
>> 
>> Mark
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/03/26533.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/03/26535.php

Reply via email to