Actually, the explanation is much simpler. You probably have more than 8 slots on borgj020, and so your job is simply small enough that we put it all on one host. If you want to force the job to use both hosts, add “-map-by node” to your cmd line
> On Jan 15, 2016, at 7:02 AM, Jim Edwards <jedwa...@ucar.edu> wrote: > > > > On Fri, Jan 15, 2016 at 7:53 AM, Matt Thompson <fort...@gmail.com > <mailto:fort...@gmail.com>> wrote: > All, > > I'm not too sure if this is an MPI issue, a Fortran issue, or something else > but I thought I'd ask the MPI gurus here first since my web search failed me. > > There is a chance in the future I might want/need to query an environment > variable in a Fortran program, namely to figure out what switch a currently > running process is on (via SLURM_TOPOLOGY_ADDR in my case) and perhaps make a > "per-switch" communicator.[1] > > So, I coded up a boring Fortran program whose only exciting lines are: > > call MPI_Get_Processor_Name(processor_name,name_length,ierror) > call get_environment_variable("HOST",host_name) > > write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)') "Process", myid, "of", npes, "is on > processor", trim(processor_name) > write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)') "Process", myid, "of", npes, "is on > host", trim(host_name) > > I decided to try out with the HOST environment variable first because it is > simple and different per node (I didn't want to take many, many nodes to find > the point when a switch is traversed). I then grabbed two nodes with 4 > processes per node and...: > > (1046) $ echo "$SLURM_NODELIST" > borgj[020,036] > (1047) $ pdsh -w "$SLURM_NODELIST" echo '$HOST' > borgj036: borgj036 > borgj020: borgj020 > (1048) $ mpifort -o hostenv.x hostenv.F90 > (1049) $ mpirun -np 8 ./hostenv.x | sort -g -k2 > Process 0 of 8 is on host borgj020 > Process 0 of 8 is on processor borgj020 > Process 1 of 8 is on host borgj020 > Process 1 of 8 is on processor borgj020 > Process 2 of 8 is on host borgj020 > Process 2 of 8 is on processor borgj020 > Process 3 of 8 is on host borgj020 > Process 3 of 8 is on processor borgj020 > Process 4 of 8 is on host borgj020 > Process 4 of 8 is on processor borgj036 > Process 5 of 8 is on host borgj020 > Process 5 of 8 is on processor borgj036 > Process 6 of 8 is on host borgj020 > Process 6 of 8 is on processor borgj036 > Process 7 of 8 is on host borgj020 > Process 7 of 8 is on processor borgj036 > > It looks like MPI_Get_Processor_Name is doing its thing, but the HOST one > seems to only be reflecting the first host. My guess is that OpenMPI doesn't > export every processes' environment separately to every process so it is > reflecting HOST from process 0. > > > I would guess that what is actually happening is that slurm is exporting all > of the variables from the host node including the $HOST variable and > overwriting the > defaults on other nodes. You should use the SLURM options to limit the > list of > variables that you export from the host to only those that you need. > > > > > > So, I guess my question is: can this be done? Is there an option to Open MPI > that might do it? Or is this just something MPI doesn't do? Or is my > Google-fu just too weak to figure out the right search-phrase to find the > answer to this probable FAQ? > > Matt > > [1] Note, this might be unnecessary, but I got to the point where I wanted to > see if I *could* do it, rather than *should*. > > -- > Matt Thompson > Man Among Men > Fulcrum of History > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28287.php > <http://www.open-mpi.org/community/lists/users/2016/01/28287.php> > > > > -- > Jim Edwards > > CESM Software Engineer > National Center for Atmospheric Research > Boulder, CO > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28289.php > <http://www.open-mpi.org/community/lists/users/2016/01/28289.php>