Re: [OMPI users] open mpi on blue waters

2015-03-26 Thread Gilles Gouaillardet
Mark, thanks for the link. i tried to read between the lines, and "found" that in the case of torque+munge, munge might be required only on admin nodes and submission hosts (which could be restricted to login nodes on most systems) on the other hand, slurm does require munge on compute nodes, ev

Re: [OMPI users] open mpi on blue waters

2015-03-26 Thread Mark Santcroos
> On 26 Mar 2015, at 16:01 , Ralph Castain wrote: > >> >> On Mar 26, 2015, at 1:33 AM, Mark Santcroos >> wrote: >> >> Hi guys, >> >> Thanks for the follow-up. >> >> It appears that you are ruling out that Munge is required because the system >> runs TORQUE, but as far as I can see Munge i

Re: [OMPI users] open mpi on blue waters

2015-03-26 Thread Ralph Castain
> On Mar 25, 2015, at 9:38 PM, Gilles Gouaillardet > wrote: > > On 2015/03/26 13:00, Ralph Castain wrote: >> Well, I did some digging around, and this PR looks like the right solution. > ok then :-) > > following stuff is not directly related to ompi, but you might want to > comment on that an

Re: [OMPI users] open mpi on blue waters

2015-03-26 Thread Ralph Castain
> On Mar 26, 2015, at 1:33 AM, Mark Santcroos > wrote: > > Hi guys, > > Thanks for the follow-up. > > It appears that you are ruling out that Munge is required because the system > runs TORQUE, but as far as I can see Munge is/can be used by both SLURM and > TORQUE. > (http://docs.adaptivec

Re: [OMPI users] open mpi on blue waters

2015-03-26 Thread Mark Santcroos
Hi Ralph, > On 25 Mar 2015, at 21:59 , Mark Santcroos wrote: >> Anyway, see if this fixes the problem. >> >> https://github.com/open-mpi/ompi/pull/497 Can confirm the fallback works now without setting explicitly to basic (with the merged changes). Thanks! Mark

Re: [OMPI users] open mpi on blue waters

2015-03-26 Thread Mark Santcroos
Hi guys, Thanks for the follow-up. It appears that you are ruling out that Munge is required because the system runs TORQUE, but as far as I can see Munge is/can be used by both SLURM and TORQUE. (http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/1-installConfig/serverConfig.htm#usi

Re: [OMPI users] open mpi on blue waters

2015-03-26 Thread Gilles Gouaillardet
On 2015/03/26 13:00, Ralph Castain wrote: > Well, I did some digging around, and this PR looks like the right solution. ok then :-) following stuff is not directly related to ompi, but you might want to comment on that anyway ... > Second, the running of munge on the IO nodes is not only okay but

Re: [OMPI users] open mpi on blue waters

2015-03-26 Thread Ralph Castain
Well, I did some digging around, and this PR looks like the right solution. First, the security issue is fine so long as we use the highest level of security that is available. If someone configures the system with munge, then we default to it - if not, we use the next highest one available. Se

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
I’ve asked Mark to check with the sys admins as to the logic behind their configuration. I would not immediately presume that they are doing something wrong or that munge is not needed - could be used for other purposes. I fully recognize that this change doesn’t resolve all problems, but it wil

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Gilles Gouaillardet
Mark, munge is an authentication mechanism based on a secret key shared between hosts. there are both a daemon part and a library/client part. it its simplest form, you can run on node0 : echo "hello" | munge | ssh node1 unmunge (see sample output below) if everything is correctly set (e.g. sa

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
> On Mar 25, 2015, at 1:59 PM, Mark Santcroos > wrote: > > Hi Ralph, > >> On 25 Mar 2015, at 21:25 , Ralph Castain wrote: >> I think I have this resolved, >> though that I still suspect their is something wrong on that system. You >> shouldn’t have some nodes running munge and others not run

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Mark Santcroos
Hi Ralph, > On 25 Mar 2015, at 21:25 , Ralph Castain wrote: > I think I have this resolved, > though that I still suspect their is something wrong on that system. You > shouldn’t have some nodes running munge and others not running it. For completeness, it's not "some" nodes, its the MOM (servi

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
I think I have this resolved, though that I still suspect their is something wrong on that system. You shouldn’t have some nodes running munge and others not running it. I wonder if someone was experimenting and started munge on some of the nodes, and forgot to turn it off afterwards?? Anyway,

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
Much appreciated! Interesting problem/configuration :-) > On Mar 25, 2015, at 9:42 AM, Mark Santcroos > wrote: > > >> On 25 Mar 2015, at 17:39 , Ralph Castain wrote: >> Not surprising - I’m surprised to find munge on the mom’s node anyway given >> that you are using Torque. >> >> I have to

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Mark Santcroos
> On 25 Mar 2015, at 17:39 , Ralph Castain wrote: > Not surprising - I’m surprised to find munge on the mom’s node anyway given > that you are using Torque. > > I have to finish something else first, and it sounds like you aren’t blocked > at the moment. I’ll provide a patch for you to try lat

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
Not surprising - I’m surprised to find munge on the mom’s node anyway given that you are using Torque. I have to finish something else first, and it sounds like you aren’t blocked at the moment. I’ll provide a patch for you to try later, if you’re willing. > On Mar 25, 2015, at 9:32 AM, Mark Sa

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Mark Santcroos
Ok. FYI: > aprun munge -n munge: Error: Unable to access "/var/run/munge/munge.socket.2": No such file or directory Application 23792792 exit codes: 6 Application 23792792 resources: utime ~0s, stime ~1s, Rss ~27304, inblocks ~35, outblocks ~58 > On 25 Mar 2015, at 17:29 , Ralph Castain wrote

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
Yeah, what’s happening is that mpirun is picking one security mechanism for authenticating connections, but the backend daemons are picking another, and hence we get the conflict. The weird thing here is that you usually don’t see this kind of mismatch for the very reason you are hitting - it be

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Mark Santcroos
> On 25 Mar 2015, at 17:06 , Ralph Castain wrote: > > OHO! You have munge running on the head node, but not on the backends! Ok, so I now know that munge is ... :) It's running on the MOM node (not on the head node): daemon 18800 0.0 0.0 118476 3212 ?Sl 01:27 0:00 /usr/sbin/

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
Oh come on, Howard - before you go dumping more components into the system, let’s explore WHY he hit this problem. Geez… > On Mar 25, 2015, at 9:16 AM, Howard Pritchard wrote: > > kind of working fine. I don't like users having to add these kind of > specialized --mca settings > just to get

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Howard Pritchard
kind of working fine. I don't like users having to add these kind of specialized --mca settings just to get something to work. sounds like time for yet another cray specific component. 2015-03-25 10:14 GMT-06:00 Ralph Castain : > It’s working just fine, Howard - we found the problem. > > On M

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
It’s working just fine, Howard - we found the problem. > On Mar 25, 2015, at 9:12 AM, Howard Pritchard wrote: > > Mark, > > If you're wanting to use the orte-submit feature, you will need to get mpirun > working. > > Could you rerun using the mpirun launch method but with > > --mca oob_base_

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Howard Pritchard
Mark, If you're wanting to use the orte-submit feature, you will need to get mpirun working. Could you rerun using the mpirun launch method but with --mca oob_base_verbose 10 --mca ess_base_verbose 2 set? Also, you may want to make sure you are using the ipogif0 eth device. This can be contro

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Mark Santcroos
> On 25 Mar 2015, at 17:06 , Ralph Castain wrote: > OHO! You have munge running on the head node, but not on the backends! Im all for munching, but what does that mean? ;-) Is that something actively running or do you mean library available or such? > Okay, all you have to do is set the MCA pa

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
OHO! You have munge running on the head node, but not on the backends! Okay, all you have to do is set the MCA param “sec” to “basic” in your environment, or add “-mca sec basic” on your cmd line > On Mar 25, 2015, at 8:53 AM, Mark Santcroos > wrote: > > nid25257:09727] sec: munge validate_c

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Mark Santcroos
> On 25 Mar 2015, at 16:52 , Ralph Castain wrote: > > Hmmm…okay, sorry to keep drilling down here, but let’s try adding “-mca > sec_base_verbose 100” now > /u/sciteam/marksant/openmpi/installation/bin/mpirun -mca oob_base_verbose 100 > -mca sec_base_verbose 100 ./a.out [nid25257:09727] mca:

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
Hmmm…okay, sorry to keep drilling down here, but let’s try adding “-mca sec_base_verbose 100” now > On Mar 25, 2015, at 8:51 AM, Mark Santcroos > wrote: > > marksant@nid25257:~> /u/sciteam/marksant/openmpi/installation/bin/mpirun -mca > oob_base_verbose 100 ./a.out > [nid25257:09350] mca: ba

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Mark Santcroos
marksant@nid25257:~> /u/sciteam/marksant/openmpi/installation/bin/mpirun -mca oob_base_verbose 100 ./a.out [nid25257:09350] mca: base: components_register: registering oob components [nid25257:09350] mca: base: components_register: found loaded component usock [nid25257:09350] mca: base: componen

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
Hmmm…well, it will generate some output, so keep the system down to two nodes if you can just to minimize the chatter. Add “-mca oob_base_verbose 100” to your cmd line > On Mar 25, 2015, at 8:45 AM, Mark Santcroos > wrote: > > Hi Ralph, > > There is no OMPI in system space and PATH and LD_LI

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Mark Santcroos
Hi Ralph, There is no OMPI in system space and PATH and LD_LIBRARY_PATH look good. Any suggestion on how to get more relevant debugging info above the table? Thanks Mark > On 25 Mar 2015, at 16:33 , Ralph Castain wrote: > > Hey Mark > > Your original error flag indicates that you are pickin

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
Yeah, I removed the need for this flag on Cray. As I said in my other note, this is a red-herring - the issue is in the mismatched libraries. > On Mar 25, 2015, at 8:36 AM, Mark Santcroos > wrote: > > >> On 25 Mar 2015, at 15:46 , Howard Pritchard wrote: >> turn off the disable getpwuid. >

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Mark Santcroos
> On 25 Mar 2015, at 15:46 , Howard Pritchard wrote: > turn off the disable getpwuid. That doesn't seem to make a difference. Have their been changes in this area? Last time I checked this a couple of months ago on Edison I needed this flag not to get spammed.

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Ralph Castain
Hey Mark Your original error flag indicates that you are picking up a connection from some proc built against a different OMPI installation. It’s a very low-level check that looks for matching version numbers. Not sure who is trying to connect, but that is the problem. Check you LD_LIBRARY_PAT

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Howard Pritchard
turn off the disable getpwuid. On Mar 25, 2015 8:14 AM, "Mark Santcroos" wrote: > Hi Howard, > > > On 25 Mar 2015, at 14:58 , Howard Pritchard wrote: > > How are you building ompi? > > My configure is rather straightforward: > ./configure --prefix=$OMPI_PREFIX --disable-getpwuid > > Maybe I got

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Mark Santcroos
Hi Howard, > On 25 Mar 2015, at 14:58 , Howard Pritchard wrote: > How are you building ompi? My configure is rather straightforward: ./configure --prefix=$OMPI_PREFIX --disable-getpwuid Maybe I got spoiled on Hopper/Edison and I need more explicit configuration on BW ... > Also what happens

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Howard Pritchard
Mark, How are you building ompi? Also what happens if you use. aprun. I work with ompi on the nersc edison and hopper daily. typically i use aprun though. you definitely dont need to use ccm. and shouldnt. On Mar 25, 2015 6:00 AM, "Mark Santcroos" wrote: > Hi, > > Any users of Open MPI on Bl