Re: [OMPI users] OpenMPI non blocking I_Allreduce segfaults when using custom function..

2015-12-17 Thread Udayanga Wickramasinghe
I tried with 1.10.0, and is failing still. I will need to check whether it
works for later releases.

Thanks
Udayanga


On Wed, Dec 16, 2015 at 5:24 PM, Nathan Hjelm  wrote:

>
> I think this is fixed in the 1.10 series. We will not be making any more
> updates to the 1.8 series so you will need to update to 1.10 to get the
> fix.
>
> -Nathan
>
> On Wed, Dec 16, 2015 at 02:48:45PM -0500, Udayanga Wickramasinghe wrote:
> >Hi all,
> >I have a custom MPI_Op function which I use within a non blocking
> version
> >of all_reduce(). When executing the mpi program I am seeing a segfault
> >thrown from libNBC. It seems like this is a known issue in openMPI
> atleast
> >[1]. Is this somehow fixed in a later release version of openmpi ? I
> am
> >using 1.8.4.
> >Thanks
> >Udayanga
> >[1] http://www.open-mpi.org/community/lists/devel/2014/04/14588.php
>
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/12/28167.php
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/12/28170.php
>


Re: [OMPI users] OpenMPI non blocking I_Allreduce segfaults when using custom function..

2015-12-17 Thread Gilles Gouaillardet

the v1.10 series was fixed from 1.10.1

Cheers,

Gilles

$ git log --grep=57d3b832972a9d914a7c2067a526dfa3df1dbb34
commit e1ceb4e5f9dadb44edb77662a13058c9b3746505
Author: Nathan Hjelm 
List-Post: users@lists.open-mpi.org
Date:   Fri Oct 2 10:35:21 2015 -0600

op: allow user operations in ompi_3buff_op_reduce

This commit allows user operations to be used in the
ompi_3buff_op_reduce function. This fixes an issue identified in:

http://www.open-mpi.org/community/lists/devel/2014/04/14586.php

and

http://www.open-mpi.org/community/lists/users/2015/10/27769.php

The fix is to copy source1 into the target then call the user op
function with source2 and target.

Fixes #966

(cherry picked from commit 
open-mpi/ompi@57d3b832972a9d914a7c2067a526dfa3df1dbb34)


Signed-off-by: Nathan Hjelm 


On 12/17/2015 3:33 PM, Udayanga Wickramasinghe wrote:
I tried with 1.10.0, and is failing still. I will need to check 
whether it works for later releases.


Thanks
Udayanga


On Wed, Dec 16, 2015 at 5:24 PM, Nathan Hjelm > wrote:



I think this is fixed in the 1.10 series. We will not be making
any more
updates to the 1.8 series so you will need to update to 1.10 to
get the
fix.

-Nathan

On Wed, Dec 16, 2015 at 02:48:45PM -0500, Udayanga Wickramasinghe
wrote:
>Hi all,
>I have a custom MPI_Op function which I use within a non
blocking version
>of all_reduce(). When executing the mpi program I am seeing a
segfault
>thrown from libNBC. It seems like this is a known issue in
openMPI atleast
>[1]. Is this somehow fixed in a later release version of
openmpi ? I am
>using 1.8.4.
>Thanks
>Udayanga
>[1]
http://www.open-mpi.org/community/lists/devel/2014/04/14588.php

> ___
> users mailing list
> us...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
http://www.open-mpi.org/community/lists/users/2015/12/28167.php


___
users mailing list
us...@open-mpi.org 
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/12/28170.php




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/12/28172.php




Re: [OMPI users] Questions about non-blocking collective calls...

2015-12-17 Thread Eric Chamberland

Hi Gilles,

Le 2015-10-21 20:31, Gilles Gouaillardet a écrit :

#3 difficult question ...
first, keep in mind there is currently no progress thread in Open MPI.
that means messages can be received only when MPI_Wait* or MPI_Test* is
invoked. you might hope messages are received when doing some
computation (overlap of computation and communication) but unfortunatly,
that does not happen most of the time.


I think you have pointed a problem with MPI usage we programmed.

We thought in fact that there were message progression done.  Since, for 
example, we programmed all the MPI_Irecv and MPI_Isend into a class, we 
never force the MPI_Isend to progress until the class destructor is 
called.  But if the communication class is an attribute of another 
class, it can be destroyed very lately into the program execution, 
leading to a deadlock...


So I have started to modify our classes to invoke MPI_Wait to make the 
progression happens.


No, I put an #ifdef around that code to be able to activate/deactivate it.

But I would like to know if the MPI I am using is able to do message 
progression or not: So how do an end-user like me can knows that? 
Does-it rely on hardware?  Is there a #define by OpenMPI that one can 
uses into his code?


Thanks,

Eric



Re: [OMPI users] Questions about non-blocking collective calls...

2015-12-17 Thread Jeff Squyres (jsquyres)
On Dec 17, 2015, at 8:57 AM, Eric Chamberland 
 wrote:
> 
> But I would like to know if the MPI I am using is able to do message 
> progression or not: So how do an end-user like me can knows that? Does-it 
> rely on hardware?  Is there a #define by OpenMPI that one can uses into his 
> code?

An MPI program *must* call MPI_Test or MPI_Wait to complete a non-blocking 
request -- it's not optional.

For performance portability, it's likely a good idea to have some calls to 
MPI_Test*() periodically.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] Questions about non-blocking collective calls...

2015-12-17 Thread Eric Chamberland

Le 2015-12-17 12:45, Jeff Squyres (jsquyres) a écrit :

On Dec 17, 2015, at 8:57 AM, Eric Chamberland 
 wrote:


But I would like to know if the MPI I am using is able to do message 
progression or not: So how do an end-user like me can knows that? Does-it rely 
on hardware?  Is there a #define by OpenMPI that one can uses into his code?


An MPI program *must* call MPI_Test or MPI_Wait to complete a non-blocking 
request -- it's not optional.

Just to be clear: we *always* call MPI_Wait.  Now the question was about 
*when* to do it.


We did 2 different things:

#1- ASAP after the MPI_Isend
#2- As late as possible, in a class destructor for example, which can 
occur a while after other MPI_Irecv and MPI_Isend pairs have been issued.


Is it true to tell that if there were message progression, the receiving 
side can complete the MPI_Wait linked to the MPI_Irecv call, event if 
the sending side have *not yet* called the MPI_Wait linked to the 
MPI_Isend ?



For performance portability, it's likely a good idea to have some calls to 
MPI_Test*() periodically.



Interesting and easy to do for us...

Thanks,

Eric



Re: [OMPI users] Questions about non-blocking collective calls...

2015-12-17 Thread Jeff Squyres (jsquyres)
On Dec 17, 2015, at 1:39 PM, Eric Chamberland 
 wrote:
> 
> Just to be clear: we *always* call MPI_Wait.  Now the question was about 
> *when* to do it.

Ok.  Remember that the various flavors of MPI_Test are acceptable, too.  And 
it's ok to call MPI_Test*/MPI_Wait* with MPI_REQUEST_NULL (i.e., if an earlier 
Test/Wait completed a request and set it to MPI_REQUEST_NULL).

> We did 2 different things:
> 
> #1- ASAP after the MPI_Isend
> #2- As late as possible, in a class destructor for example, which can occur a 
> while after other MPI_Irecv and MPI_Isend pairs have been issued.
> 
> Is it true to tell that if there were message progression, the receiving side 
> can complete the MPI_Wait linked to the MPI_Irecv call, event if the sending 
> side have *not yet* called the MPI_Wait linked to the MPI_send ?

That is certainly possible, yes.  It depends on a bunch of factors, such as the 
underlying networking hardware, the length of the message, etc.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI users] Need help resolving "error obtaining device context for mlx4_0"

2015-12-17 Thread Bathke, Chuck
Hi,
   I have a system of AMD blades that I am trying to run MCNP6 on using 
OPENMPI. I installed openmpi-1.6.5. I also have installed Intel FORTRAN and C 
compiles. I compiled MCNP6 using FC="mpif90" CC="mpicc" ... It runs just fine 
when I run it on a 1-hour test case on just one blade. I need to run it on 
several blades, but it issues an error and crashes and burns. I have sought 
help here, but no one seems to know how to fix it. I have mounted /opt and 
/home on bud and bud6 on the corresponding /opt and /home on bud4, at their 
suggestion. That did not fix anything. Please look at the attached file 
(created with bud4>tar -zcf info.tgz mpihT3) that holds the data that is 
requested at https://www.open-mpi.org/community/help/ and in bullet 13 on 
https://www.open-mpi.org/community/help/ . Can you look at it and suggest a 
solution? I suspect that it is something trivial that does not stand out and 
say, "look here you idiot." Thanks.


Charles "Chuck" Bathke

MS-C921
Los Alamos National Laboratory
P.O. Box 1663
Los Alamos, NM 87545
Phone:(505)667-7214
Cell:(505)695-5709
Fax: 505-665-2897
Location: TA-16, Building 0200, Room 125
NEN-5 Group Office: 505-667-0914



info.tgz
Description: info.tgz


Re: [OMPI users] Need help resolving "error obtaining device context for mlx4_0"

2015-12-17 Thread Ralph Castain
You might want to check the permissions on the MLX device directory - according 
to that error message, the permissions are preventing you from accessing the 
device. Without getting access, we don’t have a way to communicate across nodes 
- you can run on one node using shared memory, but not multiple nodes.

So it looks like there is some device-level permissions issue in play.


> On Dec 17, 2015, at 2:39 PM, Bathke, Chuck  wrote:
> 
> Hi,
>I have a system of AMD blades that I am trying to run MCNP6 on using 
> OPENMPI. I installed openmpi-1.6.5. I also have installed Intel FORTRAN and C 
> compiles. I compiled MCNP6 using FC="mpif90" CC="mpicc" … It runs just fine 
> when I run it on a 1-hour test case on just one blade. I need to run it on 
> several blades, but it issues an error and crashes and burns. I have sought 
> help here, but no one seems to know how to fix it. I have mounted /opt and 
> /home on bud and bud6 on the corresponding /opt and /home on bud4, at their 
> suggestion. That did not fix anything. Please look at the attached file 
> (created with bud4>tar -zcf info.tgz mpihT3) that holds the data that is 
> requested at https://www.open-mpi.org/community/help/ 
>  and in bullet 13 
> onhttps://www.open-mpi.org/community/help/ 
>  . Can you look at it and suggest a 
> solution? I suspect that it is something trivial that does not stand out and 
> say, “look here you idiot.” Thanks.
>  
>  
> Charles "Chuck" Bathke
>  
> MS-C921
> Los Alamos National Laboratory
> P.O. Box 1663
> Los Alamos, NM 87545
> Phone:(505)667-7214
> Cell:(505)695-5709
> Fax: 505-665-2897
> Location: TA-16, Building 0200, Room 125
> NEN-5 Group Office: 505-667-0914
>  
> ___
> users mailing list
> us...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/12/28178.php 
> 


Re: [OMPI users] Need help resolving "error obtaining device context for mlx4_0"

2015-12-17 Thread Bathke, Chuck
Ralph,
Where would these be, in /dev?
Chuck

From: Ralph Castain [mailto:r...@open-mpi.org]
Sent: Thursday, December 17, 2015 04:13 PM
To: Open MPI Users 
Subject: Re: [OMPI users] Need help resolving "error obtaining device context 
for mlx4_0"

You might want to check the permissions on the MLX device directory - according 
to that error message, the permissions are preventing you from accessing the 
device. Without getting access, we don’t have a way to communicate across nodes 
- you can run on one node using shared memory, but not multiple nodes.

So it looks like there is some device-level permissions issue in play.


On Dec 17, 2015, at 2:39 PM, Bathke, Chuck 
mailto:bat...@lanl.gov>> wrote:

Hi,
   I have a system of AMD blades that I am trying to run MCNP6 on using 
OPENMPI. I installed openmpi-1.6.5. I also have installed Intel FORTRAN and C 
compiles. I compiled MCNP6 using FC="mpif90" CC="mpicc" … It runs just fine 
when I run it on a 1-hour test case on just one blade. I need to run it on 
several blades, but it issues an error and crashes and burns. I have sought 
help here, but no one seems to know how to fix it. I have mounted /opt and 
/home on bud and bud6 on the corresponding /opt and /home on bud4, at their 
suggestion. That did not fix anything. Please look at the attached file 
(created with bud4>tar -zcf info.tgz mpihT3) that holds the data that is 
requested at https://www.open-mpi.org/community/help/ and in bullet 13 
onhttps://www.open-mpi.org/community/help/ . Can you look at it and suggest a 
solution? I suspect that it is something trivial that does not stand out and 
say, “look here you idiot.” Thanks.


Charles "Chuck" Bathke

MS-C921
Los Alamos National Laboratory
P.O. Box 1663
Los Alamos, NM 87545
Phone:(505)667-7214
Cell:(505)695-5709
Fax: 505-665-2897
Location: TA-16, Building 0200, Room 125
NEN-5 Group Office: 505-667-0914

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/12/28178.php



Re: [OMPI users] Need help resolving "error obtaining device context for mlx4_0"

2015-12-17 Thread Ralph Castain
To be honest, it’s been a very long time since I had an IB machine. Howard, 
Nathan, or someone who has one - can you answer?



> On Dec 17, 2015, at 3:53 PM, Bathke, Chuck  wrote:
> 
> Ralph,
> Where would these be, in /dev?
> Chuck
>  
> From: Ralph Castain [mailto:r...@open-mpi.org] 
> Sent: Thursday, December 17, 2015 04:13 PM
> To: Open MPI Users  
> Subject: Re: [OMPI users] Need help resolving "error obtaining device context 
> for mlx4_0" 
>  
> You might want to check the permissions on the MLX device directory - 
> according to that error message, the permissions are preventing you from 
> accessing the device. Without getting access, we don’t have a way to 
> communicate across nodes - you can run on one node using shared memory, but 
> not multiple nodes.
> 
> So it looks like there is some device-level permissions issue in play.
> 
> 
>> On Dec 17, 2015, at 2:39 PM, Bathke, Chuck > > wrote:
>> 
>> Hi,
>>I have a system of AMD blades that I am trying to run MCNP6 on using 
>> OPENMPI. I installed openmpi-1.6.5. I also have installed Intel FORTRAN and 
>> C compiles. I compiled MCNP6 using FC="mpif90" CC="mpicc" … It runs just 
>> fine when I run it on a 1-hour test case on just one blade. I need to run it 
>> on several blades, but it issues an error and crashes and burns. I have 
>> sought help here, but no one seems to know how to fix it. I have mounted 
>> /opt and /home on bud and bud6 on the corresponding /opt and /home on bud4, 
>> at their suggestion. That did not fix anything. Please look at the attached 
>> file (created with bud4>tar -zcf info.tgz mpihT3) that holds the data that 
>> is requested at https://www.open-mpi.org/community/help/ 
>>  and in bullet 13 
>> onhttps://www.open-mpi.org/community/help/ 
>>  . Can you look at it and suggest 
>> a solution? I suspect that it is something trivial that does not stand out 
>> and say, “look here you idiot.” Thanks.
>>  
>>  
>> Charles "Chuck" Bathke
>>  
>> MS-C921
>> Los Alamos National Laboratory
>> P.O. Box 1663
>> Los Alamos, NM 87545
>> Phone:(505)667-7214
>> Cell:(505)695-5709
>> Fax: 505-665-2897
>> Location: TA-16, Building 0200, Room 125
>> NEN-5 Group Office: 505-667-0914
>>  
>> ___
>> users mailing list
>> us...@open-mpi.org 
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> 
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/12/28178.php 
>> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/12/28180.php



Re: [OMPI users] Need help resolving "error obtaining device context for mlx4_0"

2015-12-17 Thread Gilles Gouaillardet

There is some stuff in /dev, and also in /sys

on my system :

ls -al /dev/infiniband/
drwxr-xr-x  2 root root  120 Nov 10 17:08 .
drwxr-xr-x 21 root root 3980 Dec 13 03:09 ..
crw-rw  1 root root 231,  64 Nov 10 17:08 issm0
crw-rw-rw-  1 root root  10,  56 Nov 10 17:08 rdma_cm
crw-rw  1 root root 231,   0 Nov 10 17:08 umad0
crw-rw-rw-  1 root root 231, 192 Nov 10 17:08 uverbs0


here is what you can do to find out what is going wrong on your system
/* note if you are running selinux, that might also cause some issue */


$ mpirun -np 1 strace -e open,stat -o /tmp/hello.strace -- ./hello_c
Hello, world, I am 0 of 1, (Open MPI v3.0.0a1, package: Open MPI 
gilles@xxx Distribution, ident: 3.0.0a1, repo rev: dev-3197-g4323016, 
Unreleased developer copy, 160)


$ grep -v ENOENT /tmp/hello.strace  | grep /dev/
open("/dev/shm/open_mpi.", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = 6

open("/dev/infiniband/uverbs0", O_RDWR) = 17
open("/dev/infiniband/uverbs0", O_RDWR) = 19
open("/dev/infiniband/rdma_cm", O_RDWR) = 21
open("/dev/infiniband/rdma_cm", O_RDWR) = 21
open("/dev/infiniband/rdma_cm", O_RDWR) = 21
open("/dev/infiniband/rdma_cm", O_RDWR) = 21

$ grep -v ENOENT /tmp/hello.strace  | grep /sys/
open("/sys/devices/system/cpu/possible", O_RDONLY) = 18
stat("/sys/class/infiniband", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
open("/sys/class/infiniband", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) 
= 17

open("/sys/class/infiniband_verbs/abi_version", O_RDONLY) = 17
open("/sys/class/infiniband_verbs", 
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 17
stat("/sys/class/infiniband_verbs/abi_version", {st_mode=S_IFREG|0444, 
st_size=4096, ...}) = 0
stat("/sys/class/infiniband_verbs/uverbs0", {st_mode=S_IFDIR|0755, 
st_size=0, ...}) = 0

open("/sys/class/infiniband_verbs/uverbs0/ibdev", O_RDONLY) = 18
open("/sys/class/infiniband_verbs/uverbs0/abi_version", O_RDONLY) = 18
open("/sys/class/infiniband_verbs/uverbs0/device/vendor", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/device", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/vendor", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/device", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/vendor", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/device", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/vendor", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/device", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/vendor", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/device", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/vendor", O_RDONLY) = 17
open("/sys/class/infiniband_verbs/uverbs0/device/device", O_RDONLY) = 17
open("/sys/class/infiniband/mlx4_0/node_type", O_RDONLY) = 17
open("/sys/class/infiniband/mlx4_0/device/local_cpus", O_RDONLY) = 17
open("/sys/class/infiniband/mlx4_0/ports/1/gids/0", O_RDONLY) = 19
open("/sys/class/misc/rdma_cm/abi_version", O_RDONLY) = 19
open("/sys/class/infiniband/mlx4_0/node_guid", O_RDONLY) = 19

Cheers,

Gilles

On 12/18/2015 9:11 AM, Ralph Castain wrote:
To be honest, it’s been a very long time since I had an IB machine. 
Howard, Nathan, or someone who has one - can you answer?




On Dec 17, 2015, at 3:53 PM, Bathke, Chuck > wrote:


Ralph,
Where would these be, in /dev?
Chuck

*From*: Ralph Castain [mailto:r...@open-mpi.org]
*Sent*: Thursday, December 17, 2015 04:13 PM
*To*: Open MPI Users mailto:us...@open-mpi.org>>
*Subject*: Re: [OMPI users] Need help resolving "error obtaining 
device context for mlx4_0"


You might want to check the permissions on the MLX device directory - 
according to that error message, the permissions are preventing you 
from accessing the device. Without getting access, we don’t have a 
way to communicate across nodes - you can run on one node using 
shared memory, but not multiple nodes.


So it looks like there is some device-level permissions issue in play.


On Dec 17, 2015, at 2:39 PM, Bathke, Chuck > wrote:


Hi,
   I have a system of AMD blades that I am trying to run MCNP6 on 
using OPENMPI. I installed openmpi-1.6.5. I also have installed 
Intel FORTRAN and C compiles. I compiled MCNP6 using FC="mpif90" 
CC="mpicc" … It runs just fine when I run it on a 1-hour test case 
on just one blade. I need to run it on several blades, but it issues 
an error and crashes and burns. I have sought help here, but no one 
seems to know how to fix it. I have mounted /opt and /home on bud 
and bud6 on the corresponding /opt and /home on bud4, at their 
suggestion. That did not fix anything. Please look at the attached 
file (created with bud4>tar -zcf info.tgz mpihT3) that holds the 
data that is requested athttps://www.open-mpi.org/community/help/and 
in bullet 13 onhttps://www.open-mpi.org/community/help/. Can you 
look at it and suggest a solution? I suspect that it is

Re: [OMPI users] performance issue with OpenMPI 1.10.1

2015-12-17 Thread Jingchao Zhang
The "mpirun --hetero-nodes -bind-to core -map-by core" resolves the performance 
issue!


I reran my test in the *same* job.

SLURM resource request:

#!/bin/sh
#SBATCH -N 4
#SBATCH -n 64
#SBATCH --mem=2g
#SBATCH --time=02:00:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out


env | grep SLURM:

SLURM_CHECKPOINT_IMAGE_DIR=/lustre/work/swanson/jingchao/mpitest/examples/1.10.1/3
SLURM_NODELIST=c[3005,3011,3019,3105]
SLURM_JOB_NAME=submit
SLURMD_NODENAME=c3005
SLURM_TOPOLOGY_ADDR=s0.s5.c3005
SLURM_PRIO_PROCESS=0
SLURM_NODE_ALIASES=(null)
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node
SLURM_NNODES=4
SLURM_JOBID=5462202
SLURM_NTASKS=64
SLURM_TASKS_PER_NODE=34,26,2(x2)
SLURM_JOB_ID=5462202
SLURM_JOB_USER=jingchao
SLURM_JOB_UID=3663
SLURM_NODEID=0
SLURM_SUBMIT_DIR=/lustre/work/swanson/jingchao/mpitest/examples/1.10.1/3
SLURM_TASK_PID=53822
SLURM_NPROCS=64
SLURM_CPUS_ON_NODE=36
SLURM_PROCID=0
SLURM_JOB_NODELIST=c[3005,3011,3019,3105]
SLURM_LOCALID=0
SLURM_JOB_CPUS_PER_NODE=36,26,2(x2)
SLURM_CLUSTER_NAME=tusker
SLURM_GTIDS=0
SLURM_SUBMIT_HOST=login.tusker.hcc.unl.edu
SLURM_JOB_PARTITION=batch
SLURM_JOB_NUM_NODES=4
SLURM_MEM_PER_NODE=2048

v-1.8.4 "mpirun" and v-1.10.1 "mpirun --hetero-nodes -bind-to core -map-by 
core" now give comparable results.

v-1.10.1 "mpirun" still have unstable performance.



I tried adding the following three lines to the "openmpi-mca-params.conf" file

"

orte_hetero_nodes=1
hwloc_base_binding_policy=core
rmaps_base_bycore=1
"

and ran "mpirun lmp_ompi_g++ < in.wall.2d" with v-1.10.1.


This works for most tests but some jobs are hanging with this message:

--
The following command line options and corresponding MCA parameter have
been deprecated and replaced as follows:

  Command line options:
Deprecated:  --bycore, -bycore
Replacement: --map-by core

  Equivalent MCA parameter:
Deprecated:  rmaps_base_bycore
Replacement: rmaps_base_mapping_policy=core

The deprecated forms *will* disappear in a future version of Open MPI.
Please update to the new syntax.
--

Did I miss something in the "openmpi-mca-params.conf" file?


Thanks,


Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400



From: users  on behalf of Gilles Gouaillardet 

Sent: Wednesday, December 16, 2015 6:11 PM
To: Open MPI Users
Subject: Re: [OMPI users] performance issue with OpenMPI 1.10.1

binding is somehow involved in this, and i do not believe vader nor openib are 
involved here.

Could you please run again with the two ompi versions but in the *same* job ?
and before invoking mpirun, could you do
env | grep SLURM

per your slurm request, you are running 64 tasks on 4 nodes.
with 1.8.4, you end up running 14+14+14+22 tasks (not ideal, but quite balanced)
with 1.10.1, you end up running 2+2+12+48 tasks (very unbalanced)
so it is quite unfair to compare these two runs.

also, still in the same job, can you add a third run with 1.10.1 and the 
following options
mpirun --hetero-nodes -bind-to core -map-by core ...
and see if it helps

Cheers,

Gilles




On 12/17/2015 6:47 AM, Jingchao Zhang wrote:

Those jobs were launched with mpirun. Please see the attached files for the 
binding report with OMPI_MCA_hwloc_base_report_bindings=1.


Here is a snapshot for v-1.10.1:

[c2613.tusker.hcc.unl.edu:12049] MCW rank 0 is not bound (or bound to all 
available processors)
[c2613.tusker.hcc.unl.edu:12049] MCW rank 1 is not bound (or bound to all 
available processors)
[c2615.tusker.hcc.unl.edu:11136] MCW rank 2 is not bound (or bound to all 
available processors)
[c2615.tusker.hcc.unl.edu:11136] MCW rank 3 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 9 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 10 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 11 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 12 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 13 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 14 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 15 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 4 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 5 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 6 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 7 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 8 is not bound (or bound to all 
available proce

Re: [OMPI users] performance issue with OpenMPI 1.10.1

2015-12-17 Thread Ralph Castain
Glad you resolved it! The following MCA param has changed its name:

> rmaps_base_bycore=1

should now be

rmaps_base_mapping_policy=core

HTH
Ralph


> On Dec 17, 2015, at 5:01 PM, Jingchao Zhang  wrote:
> 
> The "mpirun --hetero-nodes -bind-to core -map-by core" resolves the 
> performance issue! 
> 
> I reran my test in the *same* job. 
> SLURM resource request:
> #!/bin/sh
> #SBATCH -N 4
> #SBATCH -n 64
> #SBATCH --mem=2g
> #SBATCH --time=02:00:00
> #SBATCH --error=job.%J.err
> #SBATCH --output=job.%J.out
> 
> env | grep SLURM:
> SLURM_CHECKPOINT_IMAGE_DIR=/lustre/work/swanson/jingchao/mpitest/examples/1.10.1/3
> SLURM_NODELIST=c[3005,3011,3019,3105]
> SLURM_JOB_NAME=submit
> SLURMD_NODENAME=c3005
> SLURM_TOPOLOGY_ADDR=s0.s5.c3005
> SLURM_PRIO_PROCESS=0
> SLURM_NODE_ALIASES=(null)
> SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node
> SLURM_NNODES=4
> SLURM_JOBID=5462202
> SLURM_NTASKS=64
> SLURM_TASKS_PER_NODE=34,26,2(x2)
> SLURM_JOB_ID=5462202
> SLURM_JOB_USER=jingchao
> SLURM_JOB_UID=3663
> SLURM_NODEID=0
> SLURM_SUBMIT_DIR=/lustre/work/swanson/jingchao/mpitest/examples/1.10.1/3
> SLURM_TASK_PID=53822
> SLURM_NPROCS=64
> SLURM_CPUS_ON_NODE=36
> SLURM_PROCID=0
> SLURM_JOB_NODELIST=c[3005,3011,3019,3105]
> SLURM_LOCALID=0
> SLURM_JOB_CPUS_PER_NODE=36,26,2(x2)
> SLURM_CLUSTER_NAME=tusker
> SLURM_GTIDS=0
> SLURM_SUBMIT_HOST=login.tusker.hcc.unl.edu 
> SLURM_JOB_PARTITION=batch
> SLURM_JOB_NUM_NODES=4
> SLURM_MEM_PER_NODE=2048
> 
> v-1.8.4 "mpirun" and v-1.10.1 "mpirun --hetero-nodes -bind-to core -map-by 
> core" now give comparable results.  
> v-1.10.1 "mpirun" still have unstable performance.
> 
> 
> I tried adding the following three lines to the "openmpi-mca-params.conf" file
> "
> orte_hetero_nodes=1
> hwloc_base_binding_policy=core
> rmaps_base_bycore=1
> "
> and ran "mpirun lmp_ompi_g++ < in.wall.2d" with v-1.10.1.
> 
> This works for most tests but some jobs are hanging with this message:
> --
> The following command line options and corresponding MCA parameter have
> been deprecated and replaced as follows:
> 
>   Command line options:
> Deprecated:  --bycore, -bycore
> Replacement: --map-by core
> 
>   Equivalent MCA parameter:
> Deprecated:  rmaps_base_bycore
> Replacement: rmaps_base_mapping_policy=core
> 
> The deprecated forms *will* disappear in a future version of Open MPI.
> Please update to the new syntax.
> --
> 
> Did I miss something in the "openmpi-mca-params.conf" file?
> 
> Thanks,
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> 
> 
> From: users mailto:users-boun...@open-mpi.org>> 
> on behalf of Gilles Gouaillardet  >
> Sent: Wednesday, December 16, 2015 6:11 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] performance issue with OpenMPI 1.10.1
>  
> binding is somehow involved in this, and i do not believe vader nor openib 
> are involved here.
> 
> Could you please run again with the two ompi versions but in the *same* job ?
> and before invoking mpirun, could you do
> env | grep SLURM
> 
> per your slurm request, you are running 64 tasks on 4 nodes.
> with 1.8.4, you end up running 14+14+14+22 tasks (not ideal, but quite 
> balanced)
> with 1.10.1, you end up running 2+2+12+48 tasks (very unbalanced)
> so it is quite unfair to compare these two runs.
> 
> also, still in the same job, can you add a third run with 1.10.1 and the 
> following options
> mpirun --hetero-nodes -bind-to core -map-by core ...
> and see if it helps
> 
> Cheers,
> 
> Gilles
> 
> 
> 
> 
> On 12/17/2015 6:47 AM, Jingchao Zhang wrote:
>> Those jobs were launched with mpirun. Please see the attached files for the 
>> binding report with OMPI_MCA_hwloc_base_report_bindings=1.
>> 
>> Here is a snapshot for v-1.10.1:  
>> [c2613.tusker.hcc.unl.edu :12049] MCW rank 
>> 0 is not bound (or bound to all available processors)
>> [c2613.tusker.hcc.unl.edu :12049] MCW rank 
>> 1 is not bound (or bound to all available processors)
>> [c2615.tusker.hcc.unl.edu :11136] MCW rank 
>> 2 is not bound (or bound to all available processors)
>> [c2615.tusker.hcc.unl.edu :11136] MCW rank 
>> 3 is not bound (or bound to all available processors)
>> [c2907.tusker.hcc.unl.edu :64131] MCW rank 
>> 9 is not bound (or bound to all available processors)
>> [c2907.tusker.hcc.unl.edu :64131] MCW rank 
>> 10 is not bound (or bound to all available processors)
>> [c2907.tusker.hcc.unl.edu :64131] MCW rank 
>> 11 is not bound (or bound to all available processors)
>> [c2907.tusker.hcc.unl.edu :64131] MC

Re: [OMPI users] performance issue with OpenMPI 1.10.1

2015-12-17 Thread Novosielski, Ryan
I'm no expert, but this one is pretty obvious. The error message says exactly 
what you should change:

 Equivalent MCA parameter:
Deprecated:  rmaps_base_bycore
Replacement: rmaps_base_mapping_policy=core

--
 *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
 || \\UTGERS  |-*O*-
 ||_// Biomedical | Ryan Novosielski - Senior Technologist
 || \\ and Health | novos...@rutgers.edu - 973/972.0922 (2x0922)
 ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
  `'

From: users [users-boun...@open-mpi.org] On Behalf Of Jingchao Zhang 
[zh...@unl.edu]
Sent: Thursday, December 17, 2015 8:01 PM
To: Open MPI Users
Subject: Re: [OMPI users] performance issue with OpenMPI 1.10.1

The "mpirun --hetero-nodes -bind-to core -map-by core" resolves the performance 
issue!


I reran my test in the *same* job.

SLURM resource request:

#!/bin/sh
#SBATCH -N 4
#SBATCH -n 64
#SBATCH --mem=2g
#SBATCH --time=02:00:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out


env | grep SLURM:

SLURM_CHECKPOINT_IMAGE_DIR=/lustre/work/swanson/jingchao/mpitest/examples/1.10.1/3
SLURM_NODELIST=c[3005,3011,3019,3105]
SLURM_JOB_NAME=submit
SLURMD_NODENAME=c3005
SLURM_TOPOLOGY_ADDR=s0.s5.c3005
SLURM_PRIO_PROCESS=0
SLURM_NODE_ALIASES=(null)
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node
SLURM_NNODES=4
SLURM_JOBID=5462202
SLURM_NTASKS=64
SLURM_TASKS_PER_NODE=34,26,2(x2)
SLURM_JOB_ID=5462202
SLURM_JOB_USER=jingchao
SLURM_JOB_UID=3663
SLURM_NODEID=0
SLURM_SUBMIT_DIR=/lustre/work/swanson/jingchao/mpitest/examples/1.10.1/3
SLURM_TASK_PID=53822
SLURM_NPROCS=64
SLURM_CPUS_ON_NODE=36
SLURM_PROCID=0
SLURM_JOB_NODELIST=c[3005,3011,3019,3105]
SLURM_LOCALID=0
SLURM_JOB_CPUS_PER_NODE=36,26,2(x2)
SLURM_CLUSTER_NAME=tusker
SLURM_GTIDS=0
SLURM_SUBMIT_HOST=login.tusker.hcc.unl.edu
SLURM_JOB_PARTITION=batch
SLURM_JOB_NUM_NODES=4
SLURM_MEM_PER_NODE=2048

v-1.8.4 "mpirun" and v-1.10.1 "mpirun --hetero-nodes -bind-to core -map-by 
core" now give comparable results.

v-1.10.1 "mpirun" still have unstable performance.



I tried adding the following three lines to the "openmpi-mca-params.conf" file

"

orte_hetero_nodes=1
hwloc_base_binding_policy=core
rmaps_base_bycore=1
"

and ran "mpirun lmp_ompi_g++ < in.wall.2d" with v-1.10.1.


This works for most tests but some jobs are hanging with this message:

--
The following command line options and corresponding MCA parameter have
been deprecated and replaced as follows:

  Command line options:
Deprecated:  --bycore, -bycore
Replacement: --map-by core

  Equivalent MCA parameter:
Deprecated:  rmaps_base_bycore
Replacement: rmaps_base_mapping_policy=core

The deprecated forms *will* disappear in a future version of Open MPI.
Please update to the new syntax.
--

Did I miss something in the "openmpi-mca-params.conf" file?


Thanks,


Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400



From: users  on behalf of Gilles Gouaillardet 

Sent: Wednesday, December 16, 2015 6:11 PM
To: Open MPI Users
Subject: Re: [OMPI users] performance issue with OpenMPI 1.10.1

binding is somehow involved in this, and i do not believe vader nor openib are 
involved here.

Could you please run again with the two ompi versions but in the *same* job ?
and before invoking mpirun, could you do
env | grep SLURM

per your slurm request, you are running 64 tasks on 4 nodes.
with 1.8.4, you end up running 14+14+14+22 tasks (not ideal, but quite balanced)
with 1.10.1, you end up running 2+2+12+48 tasks (very unbalanced)
so it is quite unfair to compare these two runs.

also, still in the same job, can you add a third run with 1.10.1 and the 
following options
mpirun --hetero-nodes -bind-to core -map-by core ...
and see if it helps

Cheers,

Gilles




On 12/17/2015 6:47 AM, Jingchao Zhang wrote:

Those jobs were launched with mpirun. Please see the attached files for the 
binding report with OMPI_MCA_hwloc_base_report_bindings=1.


Here is a snapshot for v-1.10.1:

[c2613.tusker.hcc.unl.edu:12049] MCW rank 0 is not bound (or bound to all 
available processors)
[c2613.tusker.hcc.unl.edu:12049] MCW rank 1 is not bound (or bound to all 
available processors)
[c2615.tusker.hcc.unl.edu:11136] MCW rank 2 is not bound (or bound to all 
available processors)
[c2615.tusker.hcc.unl.edu:11136] MCW rank 3 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 9 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 10 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 11 is not bound (or bound to all 
available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 12 is not bound (or bound to 

Re: [OMPI users] performance issue with OpenMPI 1.10.1

2015-12-17 Thread Jingchao Zhang
Thank you all. That's my oversight. I got a similar error with 
"hwloc_base_binding_policy=core" so I thought it was the same. 

Cheers,

Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400


From: users  on behalf of Novosielski, Ryan 

Sent: Thursday, December 17, 2015 7:04 PM
To: Open MPI Users
Subject: Re: [OMPI users] performance issue with OpenMPI 1.10.1

I'm no expert, but this one is pretty obvious. The error message says exactly 
what you should change:

 Equivalent MCA parameter:
Deprecated:  rmaps_base_bycore
Replacement: rmaps_base_mapping_policy=core

--
 *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
 || \\UTGERS  |-*O*-
 ||_// Biomedical | Ryan Novosielski - Senior Technologist
 || \\ and Health | novos...@rutgers.edu - 973/972.0922 (2x0922)
 ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
  `'

From: users [users-boun...@open-mpi.org] On Behalf Of Jingchao Zhang 
[zh...@unl.edu]
Sent: Thursday, December 17, 2015 8:01 PM
To: Open MPI Users
Subject: Re: [OMPI users] performance issue with OpenMPI 1.10.1

The "mpirun --hetero-nodes -bind-to core -map-by core" resolves the performance 
issue!


I reran my test in the *same* job.

SLURM resource request:

#!/bin/sh
#SBATCH -N 4
#SBATCH -n 64
#SBATCH --mem=2g
#SBATCH --time=02:00:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out


env | grep SLURM:

SLURM_CHECKPOINT_IMAGE_DIR=/lustre/work/swanson/jingchao/mpitest/examples/1.10.1/3
SLURM_NODELIST=c[3005,3011,3019,3105]
SLURM_JOB_NAME=submit
SLURMD_NODENAME=c3005
SLURM_TOPOLOGY_ADDR=s0.s5.c3005
SLURM_PRIO_PROCESS=0
SLURM_NODE_ALIASES=(null)
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node
SLURM_NNODES=4
SLURM_JOBID=5462202
SLURM_NTASKS=64
SLURM_TASKS_PER_NODE=34,26,2(x2)
SLURM_JOB_ID=5462202
SLURM_JOB_USER=jingchao
SLURM_JOB_UID=3663
SLURM_NODEID=0
SLURM_SUBMIT_DIR=/lustre/work/swanson/jingchao/mpitest/examples/1.10.1/3
SLURM_TASK_PID=53822
SLURM_NPROCS=64
SLURM_CPUS_ON_NODE=36
SLURM_PROCID=0
SLURM_JOB_NODELIST=c[3005,3011,3019,3105]
SLURM_LOCALID=0
SLURM_JOB_CPUS_PER_NODE=36,26,2(x2)
SLURM_CLUSTER_NAME=tusker
SLURM_GTIDS=0
SLURM_SUBMIT_HOST=login.tusker.hcc.unl.edu
SLURM_JOB_PARTITION=batch
SLURM_JOB_NUM_NODES=4
SLURM_MEM_PER_NODE=2048

v-1.8.4 "mpirun" and v-1.10.1 "mpirun --hetero-nodes -bind-to core -map-by 
core" now give comparable results.

v-1.10.1 "mpirun" still have unstable performance.



I tried adding the following three lines to the "openmpi-mca-params.conf" file

"

orte_hetero_nodes=1
hwloc_base_binding_policy=core
rmaps_base_bycore=1
"

and ran "mpirun lmp_ompi_g++ < in.wall.2d" with v-1.10.1.


This works for most tests but some jobs are hanging with this message:

--
The following command line options and corresponding MCA parameter have
been deprecated and replaced as follows:

  Command line options:
Deprecated:  --bycore, -bycore
Replacement: --map-by core

  Equivalent MCA parameter:
Deprecated:  rmaps_base_bycore
Replacement: rmaps_base_mapping_policy=core

The deprecated forms *will* disappear in a future version of Open MPI.
Please update to the new syntax.
--

Did I miss something in the "openmpi-mca-params.conf" file?


Thanks,


Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400



From: users  on behalf of Gilles Gouaillardet 

Sent: Wednesday, December 16, 2015 6:11 PM
To: Open MPI Users
Subject: Re: [OMPI users] performance issue with OpenMPI 1.10.1

binding is somehow involved in this, and i do not believe vader nor openib are 
involved here.

Could you please run again with the two ompi versions but in the *same* job ?
and before invoking mpirun, could you do
env | grep SLURM

per your slurm request, you are running 64 tasks on 4 nodes.
with 1.8.4, you end up running 14+14+14+22 tasks (not ideal, but quite balanced)
with 1.10.1, you end up running 2+2+12+48 tasks (very unbalanced)
so it is quite unfair to compare these two runs.

also, still in the same job, can you add a third run with 1.10.1 and the 
following options
mpirun --hetero-nodes -bind-to core -map-by core ...
and see if it helps

Cheers,

Gilles




On 12/17/2015 6:47 AM, Jingchao Zhang wrote:

Those jobs were launched with mpirun. Please see the attached files for the 
binding report with OMPI_MCA_hwloc_base_report_bindings=1.


Here is a snapshot for v-1.10.1:

[c2613.tusker.hcc.unl.edu:12049] MCW rank 0 is not bound (or bound to all 
available processors)
[c2613.tusker.hcc.unl.edu:12049] MCW rank 1 is not bound (or bound to all 
available processors)
[c2615.tusker.hcc.unl.edu:11136] MCW rank 2 is not bound (or bound to all 
available processors)
[c2615.tusker.hcc.unl