I’ve asked Mark to check with the sys admins as to the logic behind their configuration. I would not immediately presume that they are doing something wrong or that munge is not needed - could be used for other purposes.
I fully recognize that this change doesn’t resolve all problems, but it will resolve at least some of them. Barring a two-sided handshake, there isn’t much else that can be done. I’m not convinced that it represents a security risk, but I am having it reviewed. Improving the error message regardless is a good idea and I’ll follow up on it. > On Mar 25, 2015, at 7:11 PM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote: > > Mark, > > munge is an authentication mechanism based > on a secret key shared between hosts. > > there are both a daemon part and a library/client part. > > it its simplest form, you can run on node0 : > > echo "hello" | munge | ssh node1 unmunge > (see sample output below) > > if everything is correctly set (e.g. same shared secret key and munge > daemons are running) > then node1 will know for sure (well, at least as long as the secret key > stays secret ...) > the "hello" message was sent from node0 by this user, at that time and > was not altered. > there is also a time-to-live (ttl) for each message and an anti replay > mechanism > > for example, munge can be used by SLURM in order to authentificate > messages between compute and head nodes. > > > so at first glance, it makes little sense to have munge running on some > nodes and not on others. > but since blue waters is running torque (at least this is what i found > on google), munge could be not needed > (and in this case, it is probably best to have it disabled on all nodes) > or it could be used by other services. > > > Ralph, > > the commit log of PR #497 is : > "Let the initiator of the connection determine the method to be used - > if the receiver cannot support it, then that's an error that will cause > the connection attempt to fail." > > from a security point of view, and imho, that seems wrong to me. > (it's like entering a bank and asking "i have no credit card, no id but > i ask you to trust me and give me my money"). > > also, that does not handle all cases : > /* if munge is not running on some nodes, and those nodes happen *not* > to be the initiator, that will break, > a trivial (and dumb) corner case is if mpirun runs on the compute node, > and the head node is part of the node list ... */ > > back to ompi, i d rather have the initiator send its authentication > method with its authentication key, and have the server > check it is using the same authentication method, and fail with an user > friendly error message otherwise. > (authentication type mismatch : node0 uses munge but node1 uses basic, > you should contact your sysadmin > or if you know what you are doing, you can try mpirun -mca sec basic) > > on blue waters, that would mean ompi does not run out of the box, but > fails with an understandable message. > that would be less user friendly, but more secure > > any thoughts ? > > Cheers, > > Gilles > > > > > [gouaillardet@node0 ~]$ echo coucou | munge | ssh node1 unmunge > STATUS: Success (0) > ENCODE_HOST: node0 (10.7.67.3) > ENCODE_TIME: 2015-03-26 09:55:16 (1427331316) > DECODE_TIME: 2015-03-26 09:55:16 (1427331316) > TTL: 300 > CIPHER: aes128 (4) > MAC: sha1 (3) > ZIP: none (0) > UID: gouaillardet (1011) > GID: gouaillardet (1011) > LENGTH: 7 > > coucou > > On 2015/03/26 5:59, Mark Santcroos wrote: >> Hi Ralph, >> >>> On 25 Mar 2015, at 21:25 , Ralph Castain <r...@open-mpi.org> wrote: >>> I think I have this resolved, >>> though that I still suspect their is something wrong on that system. You >>> shouldn’t have some nodes running munge and others not running it. >> For completeness, it's not "some" nodes, its the MOM (service) nodes that >> run it, and the compute nodes don't. >> I don't know munge well enough to judge whether it makes sense to have it >> there only and not on the compute nodes? >> >>> I wonder if someone was experimenting and started munge on some of the >>> nodes, and forgot to turn it off afterwards?? >> If the answer to my request for clarification is along the lines of "No!", >> then I can ask the admins whats up. >> >>> Anyway, see if this fixes the problem. >>> >>> https://github.com/open-mpi/ompi/pull/497 >> Will get back to you later how that works for me. >> >> Thanks >> >> Mark >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/03/26533.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/03/26535.php