[OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

2015-04-01 Thread Lane, William
I'm having problems running OpenMPI jobs
(using a hostfile) on an HPC cluster running
ROCKS on CentOS 6.3. I'm running OpenMPI
outside of Sun Grid Engine (i.e. it is not submitted
as a job to SGE). The program being run is a LAPACK
benchmark. The commandline parameter I'm
using to run the jobs is:

$MPI_DIR/bin/mpirun -np $NSLOTS -bind-to-core -report-bindings --hostfile 
hostfile --mca btl_tcp_if_include eth0 --prefix $MPI_DIR 
$BENCH_DIR/$APP_DIR/$APP_BIN

Where MPI_DIR=/hpc/apps/mpi/openmpi/1.8.2/
NSLOTS=128

I'm getting errors of the form and OpenMPI never runs the LAPACK benchmark:

   --
   WARNING: a request was made to bind a process. While the system
   supports binding the process itself, at least one node does NOT
   support binding memory to the process location.

Node:  csclprd3-0-11

   This usually is due to not having the required NUMA support installed
   on the node. In some Linux distributions, the required support is
   contained in the libnumactl and libnumactl-devel packages.
   This is a warning only; your job will continue, though performance may be 
degraded.
   --

   --
   A request was made to bind to that would result in binding more
   processes than cpus on a resource:

  Bind to: CORE
  Node:csclprd3-0-11
  #processes:  2
  #cpus:   1

   You can override this protection by adding the "overload-allowed"
   option to your binding directive.
   --

The only installed numa packages are:
numactl.x86_642.0.7-3.el6   
 @centos6.3-x86_64-0/$

When I search for the available NUMA packages I find:

yum search numa | less

Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
== N/S Matched: numa 
===
numactl-devel.i686 : Development package for building Applications that 
use numa
numactl-devel.x86_64 : Development package for building Applications 
that use
 : numa
numad.x86_64 : NUMA user daemon
numactl.i686 : Library for tuning for Non Uniform Memory Access machines
numactl.x86_64 : Library for tuning for Non Uniform Memory Access 
machines

Do I need to install additional and/or different NUMA packages in order to get 
OpenMPI to work
on this cluster?

-Bill Lane
IMPORTANT WARNING: This message is intended for the use of the person or entity 
to which it is addressed and may contain information that is privileged and 
confidential, the disclosure of which is governed by applicable law. If the 
reader of this message is not the intended recipient, or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this information is 
strictly prohibited. Thank you for your cooperation.


Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

2015-04-01 Thread Ralph Castain
The warning about binding to memory is due to not having numactl-devel
installed on the system. The job would still run, but we are warning you
that we cannot bind memory to the same domain as the core where we bind the
process. Can cause poor performance, but not fatal. I forget the name of
the param, but you can tell us to "shut up" :-)

The other warning/error indicates that we aren't seeing enough cores on the
allocation you gave us via the hostile to support one proc/core - i.e., we
didn't at least 128 cores in the sum of the nodes you told us about. I take
it you were expecting that there were that many or more?

Ralph


On Wed, Apr 1, 2015 at 12:54 AM, Lane, William 
wrote:

>  I'm having problems running OpenMPI jobs
> (using a hostfile) on an HPC cluster running
> ROCKS on CentOS 6.3. I'm running OpenMPI
> outside of Sun Grid Engine (i.e. it is not submitted
> as a job to SGE). The program being run is a LAPACK
> benchmark. The commandline parameter I'm
> using to run the jobs is:
>
> $MPI_DIR/bin/mpirun -np $NSLOTS -bind-to-core -report-bindings --hostfile
> hostfile --mca btl_tcp_if_include eth0 --prefix $MPI_DIR
> $BENCH_DIR/$APP_DIR/$APP_BIN
>
> Where MPI_DIR=/hpc/apps/mpi/openmpi/1.8.2/
> NSLOTS=128
>
> I'm getting errors of the form and OpenMPI never runs the LAPACK benchmark:
>
>
> --
>WARNING: a request was made to bind a process. While the system
>supports binding the process itself, at least one node does NOT
>support binding memory to the process location.
>
> Node:  csclprd3-0-11
>
>This usually is due to not having the required NUMA support installed
>on the node. In some Linux distributions, the required support is
>contained in the libnumactl and libnumactl-devel packages.
>This is a warning only; your job will continue, though performance may
> be degraded.
>
> --
>
>
> --
>A request was made to bind to that would result in binding more
>processes than cpus on a resource:
>
>   Bind to: CORE
>   Node:csclprd3-0-11
>   #processes:  2
>   #cpus:   1
>
>You can override this protection by adding the "overload-allowed"
>option to your binding directive.
>
> --
>
> The only installed numa packages are:
> numactl.x86_64
> 2.0.7-3.el6@centos6.3-x86_64-0/$
>
> When I search for the available NUMA packages I find:
>
> yum search numa | less
>
> Loaded plugins: fastestmirror
> Loading mirror speeds from cached hostfile
> == N/S Matched: numa
> ===
> numactl-devel.i686 : Development package for building Applications
> that use numa
> numactl-devel.x86_64 : Development package for building
> Applications that use
>  : numa
> numad.x86_64 : NUMA user daemon
> numactl.i686 : Library for tuning for Non Uniform Memory Access
> machines
> numactl.x86_64 : Library for tuning for Non Uniform Memory Access
> machines
>
> Do I need to install additional and/or different NUMA packages in order to
> get OpenMPI to work
> on this cluster?
>
> -Bill Lane
>  IMPORTANT WARNING: This message is intended for the use of the person or
> entity to which it is addressed and may contain information that is
> privileged and confidential, the disclosure of which is governed by
> applicable law. If the reader of this message is not the intended
> recipient, or the employee or agent responsible for delivering it to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this information is strictly prohibited. Thank
> you for your cooperation.
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Searchable archives:
> http://www.open-mpi.org/community/lists/users/2015/04/index.php
>


Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

2015-04-01 Thread Lane, William
Ralph,

Here's the associated hostfile:

#openMPI hostfile for csclprd3
#max slots prevents oversubscribing csclprd3-0-9
csclprd3-0-0 slots=12 max-slots=12
csclprd3-0-1 slots=6 max-slots=6
csclprd3-0-2 slots=6 max-slots=6
csclprd3-0-3 slots=6 max-slots=6
csclprd3-0-4 slots=6 max-slots=6
csclprd3-0-5 slots=6 max-slots=6
csclprd3-0-6 slots=6 max-slots=6
csclprd3-0-7 slots=32 max-slots=32
csclprd3-0-8 slots=32 max-slots=32
csclprd3-0-9 slots=32 max-slots=32
csclprd3-0-10 slots=32 max-slots=32
csclprd3-0-11 slots=32 max-slots=32
csclprd3-0-12 slots=12 max-slots=12
csclprd3-0-13 slots=24 max-slots=24
csclprd3-0-14 slots=16 max-slots=16
csclprd3-0-15 slots=16 max-slots=16
csclprd3-0-16 slots=24 max-slots=24
csclprd3-0-17 slots=24 max-slots=24
csclprd3-6-1 slots=4 max-slots=4
csclprd3-6-5 slots=4 max-slots=4

The number of slots also includes hyperthreading
cores.

One more question, would not having defined swap
partitions on all the nodes in the ring cause OpenMPI
to crash? Because no swap partitions are defined
for any of the above systems.

-Bill L.



From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
[r...@open-mpi.org]
Sent: Wednesday, April 01, 2015 5:04 AM
To: Open MPI Users
Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

The warning about binding to memory is due to not having numactl-devel 
installed on the system. The job would still run, but we are warning you that 
we cannot bind memory to the same domain as the core where we bind the process. 
Can cause poor performance, but not fatal. I forget the name of the param, but 
you can tell us to "shut up" :-)

The other warning/error indicates that we aren't seeing enough cores on the 
allocation you gave us via the hostile to support one proc/core - i.e., we 
didn't at least 128 cores in the sum of the nodes you told us about. I take it 
you were expecting that there were that many or more?

Ralph


On Wed, Apr 1, 2015 at 12:54 AM, Lane, William 
mailto:william.l...@cshs.org>> wrote:
I'm having problems running OpenMPI jobs
(using a hostfile) on an HPC cluster running
ROCKS on CentOS 6.3. I'm running OpenMPI
outside of Sun Grid Engine (i.e. it is not submitted
as a job to SGE). The program being run is a LAPACK
benchmark. The commandline parameter I'm
using to run the jobs is:

$MPI_DIR/bin/mpirun -np $NSLOTS -bind-to-core -report-bindings --hostfile 
hostfile --mca btl_tcp_if_include eth0 --prefix $MPI_DIR 
$BENCH_DIR/$APP_DIR/$APP_BIN

Where MPI_DIR=/hpc/apps/mpi/openmpi/1.8.2/
NSLOTS=128

I'm getting errors of the form and OpenMPI never runs the LAPACK benchmark:

   --
   WARNING: a request was made to bind a process. While the system
   supports binding the process itself, at least one node does NOT
   support binding memory to the process location.

Node:  csclprd3-0-11

   This usually is due to not having the required NUMA support installed
   on the node. In some Linux distributions, the required support is
   contained in the libnumactl and libnumactl-devel packages.
   This is a warning only; your job will continue, though performance may be 
degraded.
   --

   --
   A request was made to bind to that would result in binding more
   processes than cpus on a resource:

  Bind to: CORE
  Node:csclprd3-0-11
  #processes:  2
  #cpus:   1

   You can override this protection by adding the "overload-allowed"
   option to your binding directive.
   --

The only installed numa packages are:
numactl.x86_642.0.7-3.el6   
 @centos6.3-x86_64-0/$

When I search for the available NUMA packages I find:

yum search numa | less

Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
== N/S Matched: numa 
===
numactl-devel.i686 : Development package for building Applications that 
use numa
numactl-devel.x86_64 : Development package for building Applications 
that use
 : numa
numad.x86_64 : NUMA user daemon
numactl.i686 : Library for tuning for Non Uniform Memory Access machines
numactl.x86_64 : Library for tuning for Non Uniform Memory Access 
machines

Do I need to install additional and/or different NUMA packages in order to get 
OpenMPI to work
on this cluster?

-Bill Lane
IMPORTANT WARNING: This message is intended for the use of the person or entity 
to which it is addressed and may contain information that is privileged and 
confidential, the disclosure of which is governed by applicable law. If the 
reader of this message is not the intend

Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

2015-04-01 Thread Ralph Castain
Bingo - you said the magic word. This is a terminology issue. When we say
"core", we mean the old definition of "core", not "hyperthreads". If you
want to use HTs as your base processing unit and bind to them, then you
need to specify --bind-to hwthread. That warning should then go away.

We don't require a swap region be mounted - I didn't see anything in your
original message indicating that OMPI had actually crashed, but just wasn't
launching due to the above issue. Were you actually seeing crashes as well?


On Wed, Apr 1, 2015 at 8:31 AM, Lane, William  wrote:

>  Ralph,
>
> Here's the associated hostfile:
>
> #openMPI hostfile for csclprd3
> #max slots prevents oversubscribing csclprd3-0-9
> csclprd3-0-0 slots=12 max-slots=12
> csclprd3-0-1 slots=6 max-slots=6
> csclprd3-0-2 slots=6 max-slots=6
> csclprd3-0-3 slots=6 max-slots=6
> csclprd3-0-4 slots=6 max-slots=6
> csclprd3-0-5 slots=6 max-slots=6
> csclprd3-0-6 slots=6 max-slots=6
> csclprd3-0-7 slots=32 max-slots=32
> csclprd3-0-8 slots=32 max-slots=32
> csclprd3-0-9 slots=32 max-slots=32
> csclprd3-0-10 slots=32 max-slots=32
> csclprd3-0-11 slots=32 max-slots=32
> csclprd3-0-12 slots=12 max-slots=12
> csclprd3-0-13 slots=24 max-slots=24
> csclprd3-0-14 slots=16 max-slots=16
> csclprd3-0-15 slots=16 max-slots=16
> csclprd3-0-16 slots=24 max-slots=24
> csclprd3-0-17 slots=24 max-slots=24
> csclprd3-6-1 slots=4 max-slots=4
> csclprd3-6-5 slots=4 max-slots=4
>
> The number of slots also includes hyperthreading
> cores.
>
> One more question, would not having defined swap
> partitions on all the nodes in the ring cause OpenMPI
> to crash? Because no swap partitions are defined
> for any of the above systems.
>
> -Bill L.
>
>
>  --
> *From:* users [users-boun...@open-mpi.org] on behalf of Ralph Castain [
> r...@open-mpi.org]
> *Sent:* Wednesday, April 01, 2015 5:04 AM
> *To:* Open MPI Users
> *Subject:* Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3
>
>   The warning about binding to memory is due to not having numactl-devel
> installed on the system. The job would still run, but we are warning you
> that we cannot bind memory to the same domain as the core where we bind the
> process. Can cause poor performance, but not fatal. I forget the name of
> the param, but you can tell us to "shut up" :-)
>
>  The other warning/error indicates that we aren't seeing enough cores on
> the allocation you gave us via the hostile to support one proc/core - i.e.,
> we didn't at least 128 cores in the sum of the nodes you told us about. I
> take it you were expecting that there were that many or more?
>
>  Ralph
>
>
> On Wed, Apr 1, 2015 at 12:54 AM, Lane, William 
> wrote:
>
>>  I'm having problems running OpenMPI jobs
>> (using a hostfile) on an HPC cluster running
>> ROCKS on CentOS 6.3. I'm running OpenMPI
>> outside of Sun Grid Engine (i.e. it is not submitted
>> as a job to SGE). The program being run is a LAPACK
>> benchmark. The commandline parameter I'm
>> using to run the jobs is:
>>
>> $MPI_DIR/bin/mpirun -np $NSLOTS -bind-to-core -report-bindings --hostfile
>> hostfile --mca btl_tcp_if_include eth0 --prefix $MPI_DIR
>> $BENCH_DIR/$APP_DIR/$APP_BIN
>>
>> Where MPI_DIR=/hpc/apps/mpi/openmpi/1.8.2/
>> NSLOTS=128
>>
>> I'm getting errors of the form and OpenMPI never runs the LAPACK
>> benchmark:
>>
>>
>> --
>>WARNING: a request was made to bind a process. While the system
>>supports binding the process itself, at least one node does NOT
>>support binding memory to the process location.
>>
>> Node:  csclprd3-0-11
>>
>>This usually is due to not having the required NUMA support installed
>>on the node. In some Linux distributions, the required support is
>>contained in the libnumactl and libnumactl-devel packages.
>>This is a warning only; your job will continue, though performance may
>> be degraded.
>>
>> --
>>
>>
>> --
>>A request was made to bind to that would result in binding more
>>processes than cpus on a resource:
>>
>>   Bind to: CORE
>>   Node:csclprd3-0-11
>>   #processes:  2
>>   #cpus:   1
>>
>>You can override this protection by adding the "overload-allowed"
>>option to your binding directive.
>>
>> --
>>
>> The only installed numa packages are:
>> numactl.x86_64
>> 2.0.7-3.el6@centos6.3-x86_64-0/$
>>
>> When I search for the available NUMA packages I find:
>>
>> yum search numa | less
>>
>> Loaded plugins: fastestmirror
>> Loading mirror speeds from cached hostfile
>> == N/S Matched: numa
>> ===
>> numactl-devel.i686 : Development package f

Re: [OMPI users] 1.8.4 behaves completely different from 1.6.5

2015-04-01 Thread Thomas Klimpel
> 2. Unable to resolve: can you be more specific on this?

This was my mistake. I used "xxx.yyy.zzz" instead of "localhost" in the
startup options for orterun. (More precisely the GUI did it, but I knew
that code.) No idea how 1.6.5 managed to get around the fact that not even
"dig xxx.yyy.zzz" can resolve this hostname. All the other servers were
specified by their ip address, so no need to resolve anything there.


> 3. Host key verification failed: this likely means an ssh
misconfiguration somewhere on your machines.

You are right, only the master could do a password less ssh to the workers,
but the workers could not do a passwordless ssh to the master (or to any
other worker). I manually enabled this between 3 selected workers, and
checked that everything worked fine then. But my method to enable this
manually is time consuming, so now I use "-mca plm_ssh_no_tree_spawn 1" as
option to orterun instead.

Thanks for the help. This enabled me to do the tests I wanted to do.


> 1. Ctrl-Z issues. For the moment "don't do that".

As said, I use "kill -SIGSTOP 12345" instead now. Even if the shell would
not freeze, and orterun would stop (after first forwarding the signal to
all workers, which seems to be the most reasonable behavior to me), I would
still have to use "kill -SIGSTOP 12345" (because I don't want to pause the
workers, only the master). I verified that this triggers the crash reliable
for me with 1.6.5.

I cannot reproduce my crash with 1.8.4, but I'm not sure what I learn from
this. Maybe the new "[warn] opal_libevent2021_event_base_loop: reentrant
invocation. Only one event_base_loop can run on each event_base at once."
warning tries to tell me that I'm using MPI_THREAD_MULTIPLE incorrectly.
But I radically simplified my mpi calls for this test now, such that I only
call MPI_Send and MPI_Recv, and only on MPI_COMM_WORLD. But I still get the
warning with 1.8.4, and still can produce my crash with 1.6.5, and still
cannot reproduce my crash with 1.8.4. Is it really possible that
MPI_THREAD_MULTIPLE had a bug (the clusters were this bug can be triggered
have infiniband interconnect) in 1.6.5, which is fixed in 1.8.4?

I still fear that the bug is somewhere else in my software (because of the
history of this bug and how hard it often was to trigger it in the past).


Re: [OMPI users] 1.8.4 behaves completely different from 1.6.5

2015-04-01 Thread Ralph Castain
I know 1.8.4 is better than 1.6.5 in some regards, but I obviously can't
say if we fixed the specific bug you're referring to in your software. As
you know, thread bugs are really hard to nail down.

That event_base_loop warning could be flagging a known problem in the
openib module during inter-process connection formation. It's been on our
radar for awhile, but lacked cycles to resolve it. You might double-check
by running with "--mca btl ^openib" to see if that is the source of the
warning - I know it will run a lot slower, but you *might* get an
indication as to whether this is or isn't the issue.

Does it only crash when you pause it? Or does it crash while normally
running?


On Wed, Apr 1, 2015 at 12:09 PM, Thomas Klimpel 
wrote:

> > 2. Unable to resolve: can you be more specific on this?
>
> This was my mistake. I used "xxx.yyy.zzz" instead of "localhost" in the
> startup options for orterun. (More precisely the GUI did it, but I knew
> that code.) No idea how 1.6.5 managed to get around the fact that not even
> "dig xxx.yyy.zzz" can resolve this hostname. All the other servers were
> specified by their ip address, so no need to resolve anything there.
>
>
> > 3. Host key verification failed: this likely means an ssh
> misconfiguration somewhere on your machines.
>
> You are right, only the master could do a password less ssh to the
> workers, but the workers could not do a passwordless ssh to the master (or
> to any other worker). I manually enabled this between 3 selected workers,
> and checked that everything worked fine then. But my method to enable this
> manually is time consuming, so now I use "-mca plm_ssh_no_tree_spawn 1"
> as option to orterun instead.
>
> Thanks for the help. This enabled me to do the tests I wanted to do.
>
>
> > 1. Ctrl-Z issues. For the moment "don't do that".
>
> As said, I use "kill -SIGSTOP 12345" instead now. Even if the shell would
> not freeze, and orterun would stop (after first forwarding the signal to
> all workers, which seems to be the most reasonable behavior to me), I would
> still have to use "kill -SIGSTOP 12345" (because I don't want to pause the
> workers, only the master). I verified that this triggers the crash reliable
> for me with 1.6.5.
>
> I cannot reproduce my crash with 1.8.4, but I'm not sure what I learn from
> this. Maybe the new "[warn] opal_libevent2021_event_base_loop: reentrant
> invocation. Only one event_base_loop can run on each event_base at once."
> warning tries to tell me that I'm using MPI_THREAD_MULTIPLE incorrectly.
> But I radically simplified my mpi calls for this test now, such that I only
> call MPI_Send and MPI_Recv, and only on MPI_COMM_WORLD. But I still get the
> warning with 1.8.4, and still can produce my crash with 1.6.5, and still
> cannot reproduce my crash with 1.8.4. Is it really possible that
> MPI_THREAD_MULTIPLE had a bug (the clusters were this bug can be triggered
> have infiniband interconnect) in 1.6.5, which is fixed in 1.8.4?
>
> I still fear that the bug is somewhere else in my software (because of the
> history of this bug and how hard it often was to trigger it in the past).
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/04/26591.php
>


Re: [OMPI users] 1.8.4 behaves completely different from 1.6.5

2015-04-01 Thread Thomas Klimpel
> You might double-check by running with "--mca btl ^openib" to see if that
is the source of the warning

The warning appears always, independent of the interconnect, and even when
running with "--mca btl ^openib".


> Does it only crash when you pause it? Or does it crash while normally
running?

It is very hard to reproduce without pause. It only crashes 1 out of 5
after half an hour for a run which would take 36 hours. Smaller test cases
seem to never crash on their own, but when I pause, even quite small test
cases (less than a minute) crash, if I have more than 72 workers.


Re: [OMPI users] 1.8.4 behaves completely different from 1.6.5

2015-04-01 Thread Ralph Castain
Would it be possible to get a backtrace from one of the crashes? It would
be especially helpful if you can add --enable-debug to the OMPI config.


On Wed, Apr 1, 2015 at 1:09 PM, Thomas Klimpel 
wrote:

> > You might double-check by running with "--mca btl ^openib" to see if
> that is the source of the warning
>
> The warning appears always, independent of the interconnect, and even when
> running with "--mca btl ^openib".
>
>
> > Does it only crash when you pause it? Or does it crash while normally
> running?
>
> It is very hard to reproduce without pause. It only crashes 1 out of 5
> after half an hour for a run which would take 36 hours. Smaller test cases
> seem to never crash on their own, but when I pause, even quite small test
> cases (less than a minute) crash, if I have more than 72 workers.
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/04/26593.php
>