Mark,

munge is an authentication mechanism based
on a secret key shared between hosts.

there are both a daemon part and a library/client part.

it its simplest form, you can run on node0 :

echo "hello" | munge | ssh node1 unmunge
(see sample output below)

if everything is correctly set (e.g. same shared secret key and munge
daemons are running)
then node1 will know for sure (well, at least as long as the secret key
stays secret ...)
the "hello" message was sent from node0 by this user, at that time and
was not altered.
there is also a time-to-live (ttl) for each message and an anti replay
mechanism

for example, munge can be used by SLURM in order to authentificate
messages between compute and head nodes.


so at first glance, it makes little sense to have munge running on some
nodes and not on others.
but since blue waters is running torque (at least this is what i found
on google), munge could be not needed
(and in this case, it is probably best to have it disabled on all nodes)
or it could be used by other services.


Ralph,

the commit log of PR #497 is :
"Let the initiator of the connection determine the method to be used -
if the receiver cannot support it, then that's an error that will cause
the connection attempt to fail."

from a security point of view, and imho, that seems wrong to me.
(it's like entering a bank and asking "i have no credit card, no id but
i ask you to trust me and give me my money").

also, that does not handle all cases :
/* if munge is not running on some nodes, and those nodes happen *not*
to be the initiator, that will break,
a trivial (and dumb) corner case is if mpirun runs on the compute node,
and the head node is part of the node list ... */

back to ompi, i d rather have the initiator send its authentication
method with its authentication key, and have the server
check it is using the same authentication method, and fail with an user
friendly error message otherwise.
(authentication type mismatch : node0 uses munge but node1 uses basic,
you should contact your sysadmin
or if you know what you are doing, you can try mpirun -mca sec basic)

on blue waters, that would mean ompi does not run out of the box, but
fails with an understandable message.
that would be less user friendly, but more secure

any thoughts ?

Cheers,

Gilles




[gouaillardet@node0 ~]$ echo coucou | munge | ssh node1 unmunge
STATUS:           Success (0)
ENCODE_HOST:      node0 (10.7.67.3)
ENCODE_TIME:      2015-03-26 09:55:16 (1427331316)
DECODE_TIME:      2015-03-26 09:55:16 (1427331316)
TTL:              300
CIPHER:           aes128 (4)
MAC:              sha1 (3)
ZIP:              none (0)
UID:              gouaillardet (1011)
GID:              gouaillardet (1011)
LENGTH:           7

coucou

On 2015/03/26 5:59, Mark Santcroos wrote:
> Hi Ralph,
>
>> On 25 Mar 2015, at 21:25 , Ralph Castain <r...@open-mpi.org> wrote:
>> I think I have this resolved,
>> though that I still suspect their is something wrong on that system. You 
>> shouldn’t have some nodes running munge and others not running it.
> For completeness, it's not "some" nodes, its the MOM (service) nodes that run 
> it, and the compute nodes don't.
> I don't know munge well enough to judge whether it makes sense to have it 
> there only and not on the compute nodes?
>
>> I wonder if someone was experimenting and started munge on some of the 
>> nodes, and forgot to turn it off afterwards??
> If the answer to my request for clarification is along the lines of "No!", 
> then I can ask the admins whats up.
>
>> Anyway, see if this fixes the problem.
>>
>> https://github.com/open-mpi/ompi/pull/497
> Will get back to you later how that works for me.
>
> Thanks
>
> Mark
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/03/26533.php

Reply via email to