Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-06 Thread r...@open-mpi.org
No problem - glad you were able to work it out!


> On Oct 5, 2017, at 11:22 PM, Anthony Thyssen  
> wrote:
> 
> Sorry  r...@open-mpi.org   as Gilles Gouaillardet 
> pointed out to me the problem wasn't OpenMPI, but with the specific EPEL 
> implementation (see Redhat Bugzilla 1321154)
> 
> Today, the the server was able to be taken down for maintance, and I wanted 
> to try a few things.
> 
> After installing EPEL Testing Repotorque-4.2.10-11.el7
> 
> However I found that all the nodes were 'down'  even though everything 
> appears to be running, with no errors in the error logs.
> 
> After a lot of trials, errors and reseach, I eventually (on a whim) I decided 
> to remove the "num_node_boards=1" entry from the "torque/server_priv/nodes" 
> file and restart the server & scheduler.   Suddenly the nodes were "free" and 
> my initial test job ran.
> 
> Perhaps the EPEL-Test Torque 4.2.10-11  does not contain Numa?
> 
> ALL later tests (with OpenMPI - RHEL SRPM 1.10.6-2 re-compiled "--with-tm")  
> is now responding to the Torque mode allocation correctly and is no longer 
> simply running all the jobs on the first node.
> 
> That is$PBS_NODEFILE  ,pbsdsh hostname  and   mpirun hostnameare 
> all in agreement.
> 
> Thank you all for your help, and putting up with with me.
> 
>   Anthony Thyssen ( System Programmer ) >
>  --
>   "Around here we've got a name for people what talks to dragons."
>   "Traitor?"  Wiz asked apprehensively.
>   "No.  Lunch." -- Rick Cook, "Wizadry Consulted"
>  --
> 
> 
> On Wed, Oct 4, 2017 at 11:43 AM, r...@open-mpi.org  
> mailto:r...@open-mpi.org>> wrote:
> Can you try a newer version of OMPI, say the 3.0.0 release? Just curious to 
> know if we perhaps “fixed” something relevant.
> 
> 
>> On Oct 3, 2017, at 5:33 PM, Anthony Thyssen > > wrote:
>> 
>> FYI...
>> 
>> The problem is discussed further in 
>> 
>> Redhat Bugzilla: Bug 1321154 - numa enabled torque don't work
>>https://bugzilla.redhat.com/show_bug.cgi?id=1321154 
>> 
>> 
>> I'd seen this previous as it required me to add "num_node_boards=1" to each 
>> node in the
>> /var/lib/torque/server_priv/nodes  to get torque to at least work.  
>> Specifically by munging
>> the $PBS_NODES" (which comes out correcT) into a host list containing the 
>> correct
>> "slot=" counts.  But of course now that I have compiled OpenMPI using 
>> "--with-tm" that
>> should not have been needed as in fact is now ignored by OpenMPI in a 
>> Torque-PBS
>> environment.
>> 
>> However it seems ever since "NUMA" support was into the Torque RPM's, has 
>> also caused
>> the current problems, and is still continuing.   The last action is a new 
>> EPEL "test' version
>> (August 2017),  I will try shortly.
>> 
>> Take you for your help, though I am still open to suggestions for a 
>> replacement.
>> 
>>   Anthony Thyssen ( System Programmer )> >
>>  --
>>Encryption... is a powerful defensive weapon for free people.
>>It offers a technical guarantee of privacy, regardless of who is
>>running the government... It's hard to think of a more powerful,
>>less dangerous tool for liberty.--  Esther Dyson
>>  --
>> 
>> 
>> 
>> On Wed, Oct 4, 2017 at 9:02 AM, Anthony Thyssen > > wrote:
>> Thank you Gilles.  At least I now have something to follow though with.
>> 
>> As a FYI, the torque is the pre-built version from the Redhat Extras (EPEL) 
>> archive.
>> torque-4.2.10-10.el7.x86_64
>> 
>> Normally pre-build packages have no problems, but in this case.
>> 
>> 
>> 
>> 
>> On Tue, Oct 3, 2017 at 3:39 PM, Gilles Gouaillardet > > wrote:
>> Anthony,
>> 
>> 
>> we had a similar issue reported some times ago (e.g. Open MPI ignores torque 
>> allocation),
>> 
>> and after quite some troubleshooting, we ended up with the same behavior 
>> (e.g. pbsdsh is not working as expected).
>> 
>> see https://www.mail-archive.com/users@lists.open-mpi.org/msg29952.html 
>>  for 
>> the last email.
>> 
>> 
>> from an Open MPI point of view, i would consider the root cause is with your 
>> torque install.
>> 
>> this case was reported at 
>> http://www.clusterresources.com/pipermail/torqueusers/2016-September/018858.html
>>  
>> 
>> 
>> and no conclusion was reached.
>> 
>> 
>> Cheers,
>> 
>> 
>> Gilles
>> 
>> 
>> O

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-06 Thread Gus Correa

Hi Anthony, Ralph, Gilles, all

As far as I know, for core/processor assignment to user jobs to work,
Torque needs to be configured with cpuset support
(configure --enable-cpuset ...).
That is separate from what OpenMPI does in terms of process binding.
Otherwise, the user processes in the job
will be free to use any cores/processors on the nodes assigned to it.

Some additional work to setup Linux support for cpuset is also needed,
for Torque to use it at runtime (create a subdirectory in /dev/cpuset,
mount the cpuset file system there).
I do this in the pbs_mom daemon startup stcript,
but that can be done in other ways:

##
# create and mount /dev/cpuset
if [ ! -e /dev/cpuset ];then
mkdir /dev/cpuset
fi

if [ "`mount -t cpuset`x" == "x" ];then
   mount -t cpuset none /dev/cpuset
fi
##

I don't know if the epel Torque package is configured
with cpuset support, but I would guess it is not.
Look at /dev/cpuset in your compute nodes
to see if Torque created anything there.

I don't know either if OpenMPI can somehow bypass the cores/processors 
assigned by torque to a job, if any, or when Torque is configured
without cpuset support, to somehow still bind the MPI processes to 
cores/processors/sockets/etc.


I hope this helps,
Gus Correa

On 10/06/2017 02:22 AM, Anthony Thyssen wrote:
Sorry r...@open-mpi.org   as Gilles Gouaillardet 
pointed out to me the problem wasn't OpenMPI, but with the specific EPEL 
implementation (see Redhat Bugzilla 1321154)


Today, the the server was able to be taken down for maintance, and I 
wanted to try a few things.


After installing EPEL Testing Repo    torque-4.2.10-11.el7

However I found that all the nodes were 'down'  even though everything 
appears to be running, with no errors in the error logs.


After a lot of trials, errors and reseach, I eventually (on a whim) I 
decided to remove the "num_node_boards=1" entry from the 
"torque/server_priv/nodes" file and restart the server & scheduler.  
  Suddenly the nodes were "free" and my initial test job ran.


Perhaps the EPEL-Test Torque 4.2.10-11  does not contain Numa?

ALL later tests (with OpenMPI - RHEL SRPM 1.10.6-2 re-compiled 
"--with-tm")  is now responding to the Torque mode allocation correctly 
and is no longer simply running all the jobs on the first node.


That is    $PBS_NODEFILE  ,    pbsdsh hostname  and   mpirun hostname
are all in agreement.


Thank you all for your help, and putting up with with me.

   Anthony Thyssen ( System Programmer )    >

  --
   "Around here we've got a name for people what talks to dragons."
   "Traitor?"  Wiz asked apprehensively.
   "No.  Lunch."                     -- Rick Cook, "Wizadry Consulted"
  --


On Wed, Oct 4, 2017 at 11:43 AM, r...@open-mpi.org 
 mailto:r...@open-mpi.org>> 
wrote:


Can you try a newer version of OMPI, say the 3.0.0 release? Just
curious to know if we perhaps “fixed” something relevant.



On Oct 3, 2017, at 5:33 PM, Anthony Thyssen
mailto:a.thys...@griffith.edu.au>> wrote:

FYI...

The problem is discussed further in

Redhat Bugzilla: Bug 1321154 - numa enabled torque don't work
https://bugzilla.redhat.com/show_bug.cgi?id=1321154


I'd seen this previous as it required me to add
"num_node_boards=1" to each node in the
/var/lib/torque/server_priv/nodes  to get torque to at least
work.  Specifically by munging
the $PBS_NODES" (which comes out correcT) into a host list
containing the correct
"slot=" counts.  But of course now that I have compiled OpenMPI
using "--with-tm" that
should not have been needed as in fact is now ignored by OpenMPI
in a Torque-PBS
environment.

However it seems ever since "NUMA" support was into the Torque
RPM's, has also caused
the current problems, and is still continuing.   The last action
is a new EPEL "test' version
(August 2017),  I will try shortly.

Take you for your help, though I am still open to suggestions for
a replacement.

  Anthony Thyssen ( System Programmer )   
mailto:a.thys...@griffith.edu.au>>

 --
   Encryption... is a powerful defensive weapon for free people.
   It offers a technical guarantee of privacy, regardless of who is
   running the government... It's hard to think of a more powerful,
   less dangerous tool for liberty.            --  Esther Dyson
 --



On Wed, Oct 4, 2017 at 9:02 AM, Anthony Thyssen
mailto:a.thys...@griff

[OMPI users] Controlling spawned process

2017-10-06 Thread George Reeke
Dear colleagues,
I need some help controlling where a process spawned with
MPI_Comm_spawn goes.  I am in openmpi-1.10 under Centos 6.7.
My application is written in C and am running on a RedBarn
system with a master node (hardware box) that connects to the
outside world and two other nodes connected to it via ethernet and
Infiniband.  There are two executable files, one (I'll call it
"Rank0Pgm") that expects to be rank 0 and does all the I/O and
the other ("RanknPgm") that only communicates via MPI messages.
There are two MPI_Comm_spawns that run just after MPI_Init and
an initial broadcast that shares some setup info, like this:
MPI_Comm_spawn("andmsg", argv, 1, MPI_INFO_NULL,
   hostid, commc, &commd, &sperr);
where "andmsg" is a program that needs to communicate with the
internet and with all the other processes via a new communicator
that will be called commd (and another name for the other one).
   When I run this program with no hostfile and an mpirun line
something like this on a node with 32 cores:
/usr/lib64/openmpi-1.10/bin/mpirun -n 1 Rank0Pgm : -n 28 RanknPgm \
   < InputFile
everything works fine.  I assume the spawns use 2 of the 3 available
cores that I did not ask the program to use.

Now I want to run on the full network, so I make a hostfile like this
(call it "nodes120"):
node0 slots=22 max-slots=22
n0003 slots=40 max-slots=40
n0004 slots=56 max-slots=56
where node0 has 24 cores and I am trying to leave room for my two
spawned processes.  The spawned processes have to be able to contact
the internet, so I make an MPI_INFO with MPI_Info_create and
MPI_Info_set(mpinfo, "host", "node0")
and change the MPI_INFO_NULL in the spawn calls to point to this
new MPI_Info.  (If I leave the MPI_INFO_NULL I get a different
error that is probably not of interest here.)

Now I run the mpirun like above except now with
"--hostfile nodes120" and "-n 116" after the colon.  Now I get this
error:

"There are not enough slots available in the system to satisfy the 1
slots that were requested by the application:
  andmsg
Either request fewer slots for your application, or make more slots
available for use."

I get the same error with "max-slots=24" on the first line of the
hosts file.

Sorry for the length of all that.  Request for help:  How do I set
things up to run my rank 0 program and enough copies of RanknPgm to fill
all but some number of cores on the master hardware node, and all the
other rank n programs on the other hardware "nodes" (boxes of CPUs).
[My application will do best with the default "by slot" scheduling.]

Suggestions much appreciated.  I am quite convinced my code is OK
in that it runs OK as shown above on one hardware box.  Also runs
on my laptop with 4 cores and "-n 3 RanknPgm" so I guess I don't
even really need to reserve cores for the two spawned processes.
I thought of using old-fashioned 'fork' but I really want the
extra communicators to keep asynchronous messages separated.
The documentation says overloading is OK by default, so maybe
something else is wrong here.

George Reeke




___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Controlling spawned process

2017-10-06 Thread r...@open-mpi.org
Couple of things you can try:

* add --oversubscribe to your mpirun cmd line so it doesn’t care how many slots 
there are

* modify your MPI_INFO to be “host”, “node0:22” so it thinks there are more 
slots available

It’s possible that the “host” info processing has a bug in it, but this will 
tell us a little more and hopefully get your running. If you want to bind your 
processes to cores, then add “--bind-to core” to the cmd line



> On Oct 6, 2017, at 1:35 PM, George Reeke  wrote:
> 
> Dear colleagues,
> I need some help controlling where a process spawned with
> MPI_Comm_spawn goes.  I am in openmpi-1.10 under Centos 6.7.
> My application is written in C and am running on a RedBarn
> system with a master node (hardware box) that connects to the
> outside world and two other nodes connected to it via ethernet and
> Infiniband.  There are two executable files, one (I'll call it
> "Rank0Pgm") that expects to be rank 0 and does all the I/O and
> the other ("RanknPgm") that only communicates via MPI messages.
> There are two MPI_Comm_spawns that run just after MPI_Init and
> an initial broadcast that shares some setup info, like this:
> MPI_Comm_spawn("andmsg", argv, 1, MPI_INFO_NULL,
>   hostid, commc, &commd, &sperr);
> where "andmsg" is a program that needs to communicate with the
> internet and with all the other processes via a new communicator
> that will be called commd (and another name for the other one).
>   When I run this program with no hostfile and an mpirun line
> something like this on a node with 32 cores:
> /usr/lib64/openmpi-1.10/bin/mpirun -n 1 Rank0Pgm : -n 28 RanknPgm \
>   < InputFile
> everything works fine.  I assume the spawns use 2 of the 3 available
> cores that I did not ask the program to use.
> 
> Now I want to run on the full network, so I make a hostfile like this
> (call it "nodes120"):
> node0 slots=22 max-slots=22
> n0003 slots=40 max-slots=40
> n0004 slots=56 max-slots=56
> where node0 has 24 cores and I am trying to leave room for my two
> spawned processes.  The spawned processes have to be able to contact
> the internet, so I make an MPI_INFO with MPI_Info_create and
> MPI_Info_set(mpinfo, "host", "node0")
> and change the MPI_INFO_NULL in the spawn calls to point to this
> new MPI_Info.  (If I leave the MPI_INFO_NULL I get a different
> error that is probably not of interest here.)
> 
> Now I run the mpirun like above except now with
> "--hostfile nodes120" and "-n 116" after the colon.  Now I get this
> error:
> 
> "There are not enough slots available in the system to satisfy the 1
> slots that were requested by the application:
>  andmsg
> Either request fewer slots for your application, or make more slots
> available for use."
> 
> I get the same error with "max-slots=24" on the first line of the
> hosts file.
> 
> Sorry for the length of all that.  Request for help:  How do I set
> things up to run my rank 0 program and enough copies of RanknPgm to fill
> all but some number of cores on the master hardware node, and all the
> other rank n programs on the other hardware "nodes" (boxes of CPUs).
> [My application will do best with the default "by slot" scheduling.]
> 
> Suggestions much appreciated.  I am quite convinced my code is OK
> in that it runs OK as shown above on one hardware box.  Also runs
> on my laptop with 4 cores and "-n 3 RanknPgm" so I guess I don't
> even really need to reserve cores for the two spawned processes.
> I thought of using old-fashioned 'fork' but I really want the
> extra communicators to keep asynchronous messages separated.
> The documentation says overloading is OK by default, so maybe
> something else is wrong here.
> 
> George Reeke
> 
> 
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users