Re: [OMPI users] usNIC BTL unrecognized payload type 255 when running under SLURM srun nut not mpiexec/mpirun

2017-11-13 Thread Forai,Petar
One more thing to add, this is 100% reproducible when running with srun and no 
output when running with mpirun:

Mpiexec
[adm_forai@login-01 ~]$ srun -N 2 -n 2 --pty bash
[adm_forai@cn-21 ~]$ mpiexec -np 2 IMB-MPI1 PingPong
libibverbs: Warning: no node_type attr under /sys/class/infiniband/usnic_0.
libibverbs: Warning: no node_type attr under /sys/class/infiniband/usnic_0.
libibverbs: Warning: no node_type attr under /sys/class/infiniband/usnic_0.
 benchmarks to run PingPong
#
#Intel (R) MPI Benchmarks 4.1, MPI-1 part
#
# Date  : Mon Nov 13 14:28:57 2017
# Machine   : x86_64
# System: Linux
# Release   : 3.10.0-514.2.2.el7.x86_64
# Version   : #1 SMP Tue Dec 6 23:06:41 UTC 2016
# MPI Version   : 3.1
# MPI Thread Environment:

# New default behavior from Version 3.2 on:

# the number of iterations per message size is cut down
# dynamically when a certain run time (per message size sample)
# is expected to be exceeded. Time limit is defined by variable
# "SECS_PER_SAMPLE" (=> IMB_settings.h)
# or through the flag => -time



# Calling sequence was:

# IMB-MPI1 PingPong

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype   :   MPI_BYTE
# MPI_Datatype for reductions:   MPI_FLOAT
# MPI_Op :   MPI_SUM
#
#

# List of Benchmarks to run:

# PingPong

#---
# Benchmarking PingPong
# #processes = 2
#---
   #bytes #repetitions  t[usec]   Mbytes/sec
0 100011.22 0.00
1 100011.26 0.08
2 100011.18 0.17
4 100011.16 0.34
8 100011.19 0.68
   16 100011.18 1.36
   32 100011.28 2.71
   64 100011.40 5.35
  128 100011.6210.51
  256 100012.0820.20
  512 100012.7538.30
 1024 100014.4467.61
 2048 100016.00   122.04
 4096 100019.19   203.54
 8192 100025.41   307.42
16384 100030.88   506.04
32768 100038.29   816.18
65536  64056.42  1107.79
   131072  32087.01  1436.58
   262144  160   162.14  1541.92
   524288   80   257.73  1940.02
  1048576   40   450.37  2220.39
  2097152   20   806.20  2480.79
  4194304   10  1776.69  2251.38


# All processes entering MPI_Finalize

[adm_forai@cn-21 ~]$


SRUN


[adm_forai@login-01 ~]$ srun -N 2 -n 2 IMB-MPI1 PingPong
libibverbs: Warning: no node_type attr under /sys/class/infiniband/usnic_0.
libibverbs: Warning: no node_type attr under /sys/class/infiniband/usnic_0.
 benchmarks to run PingPong
#
#Intel (R) MPI Benchmarks 4.1, MPI-1 part
#
# Date  : Mon Nov 13 14:27:26 2017
# Machine   : x86_64
# System: Linux
# Release   : 3.10.0-514.2.2.el7.x86_64
# Version   : #1 SMP Tue Dec 6 23:06:41 UTC 2016
# MPI Version   : 3.1
# MPI Thread Environment:

# New default behavior from Version 3.2 on:

# the number of iterations per message size is cut down
# dynamically when a certain run time (per message size sample)
# is expected to be exceeded. Time limit is defined by variable
# "SECS_PER_SAMPLE" (=> IMB_settings.h)
# or through the flag => -time



# Calling sequence was:

# /software/171020/software/imb/4.1-foss-2017a/bin/IMB-MPI1 PingPong

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype   :   MPI_BYTE
# MPI_Datatype for reductions:   MPI_FLOAT
# MPI_Op :   MPI_SUM
#
#

# List of Benchmarks to run:

# PingPong

#---
# Benchmarking PingPong
# #processes = 2
#---
   #bytes #repetitions  t[usec]   Mbytes/sec
0 100011.73 0.00
1 100011.83 0.08
2 100011.66 0.16
4 100011.64 0.33
8 100011.70 0.65
   16 100011.73 1.30
   32  

[OMPI users] Invalid results with OpenMPI on Ubuntu Artful because of --enable-heterogeneous

2017-11-13 Thread Xavier Besseron
Dear all,

I want to share with you the follow issue with the OpenMPI shipped with the
latest Ubuntu Artful. It is OpenMPI 2.1.1 compiled with
option --enable-heterogeneous.

Looking at this issue https://github.com/open-mpi/ompi/issues/171, it
appears that this option is broken and should not be used.
This option is being used in Debian/Ubuntu since 2010 (
http://changelogs.ubuntu.com/changelogs/pool/universe/o/ope
nmpi/openmpi_2.1.1-6/changelog) and is still used so far. Apparently,
nobody complained so far.

However, now I complain :-)
I've found a simple example for which this option causes invalid results in
OpenMPI.


int A = 666, B = 42;
MPI_Irecv(&A, 1, MPI_INT, MPI_ANY_SOURCE, tag, comm, &req);
MPI_Send(&B, 1, MPI_INT, my_rank, tag, comm);
MPI_Wait(&req, &status);

# After that, when compiled with --enable-heterogeneous, we have A != B

This happens with just a single process. The full example is in attachment
(to be run with "mpirun -n 1 ./bug_openmpi_artful").
I extracted and simplified the code from the Zoltan library with which I
initially noticed the issue.

I find it annoying that Ubuntu distributes a broken OpenMPI.
I've also tested OpenMPI 2.1.1, 2.1.2 and 3.0.0 and using
--enable-heterogeneous causes the bug systematically.


Finally, my points/questions are:

- To share with you this small example in case you want to debug it

- What is the status of issue https://github.com/open-mpi/ompi/issues/171 ?
Is this option still considered broken?
If yes, I encourage you to remove it or mark as deprecated to avoid this
kind of mistake in the future.

- To get the feedback of OpenMPI developers on the use of this option,
which might convince the Debian/Ubuntu maintainer to remove this flag.
I have opened a bug on Ubuntu for it https://bugs.launchpad.net/
ubuntu/+source/openmpi/+bug/1731938


Thanks!

Xavier


-- 
Dr Xavier BESSERON
Research associate
FSTC, University of Luxembourg
Campus Belval, Office MNO E04 0415-040
Phone: +352 46 66 44 5418
http://luxdem.uni.lu/
#include 
#include 
#include 

int main(int argc, char* argv[])
{
int rc;

rc = MPI_Init(&argc, &argv);
if (rc != MPI_SUCCESS) abort();

int my_rank;
rc = MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
if (rc != MPI_SUCCESS) abort();


int A = 666;
int B = 42;


printf("[BEFORE] A = %d - B = %d\n", A, B);

int tag = 2999;
MPI_Comm comm = MPI_COMM_WORLD;
MPI_Status status;
MPI_Request req;

rc = MPI_Irecv(&A, 1, MPI_INT, MPI_ANY_SOURCE, tag, comm, &req);
if (rc != MPI_SUCCESS) abort();

rc = MPI_Send(&B, 1, MPI_INT, my_rank, tag, comm);
if (rc != MPI_SUCCESS) abort();
  
rc = MPI_Wait(&req, &status);
if (rc != MPI_SUCCESS) abort();

printf("[AFTER]  A = %d - B = %d\n", A, B);


if ( A != B ) 
{
printf("Error!!!\n");
}


rc = MPI_Finalize();
if (rc != MPI_SUCCESS) abort();

return 0;
}
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Failed to register memory (openmpi 2.0.2)

2017-11-13 Thread Mark Dixon

Hi there,

We're intermittently seeing messages (below) about failing to register 
memory with openmpi 2.0.2 on centos7 / Mellanox FDR Connect-X 3 / 24 core 
126G RAM Broadwell nodes and the vanilla IB stack as shipped by centos.


(We previously seen similar messages for the "ud" oob component but, as 
recommended in this thread, we stopped oob from using openib via an MCA 
parameter.)


I've checked to see what the registered memory limit is (by setting 
mlx4_core's debug_level, rebooting and examining kernel messages) and it's 
double the system RAM - which I understand is the recommended setting.


Any ideas about what might be going on, please?

Thanks,

Mark


--
The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory.  This typically can indicate that the
memlock limits are set too low.  For most HPC installations, the
memlock limits should be set to "unlimited".  The failure occured
here:

  Local host:    dc1s0b1a
  OMPI source:   btl_openib.c:752
  Function:  opal_free_list_init()
  Device:    mlx4_0
  Memlock limit: unlimited

You may need to consult with your system administrator to get this
problem fixed.  This FAQ entry on the Open MPI web site may also be
helpful:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--
[dc1s0b1a][[59067,1],0][btl_openib.c:1035:mca_btl_openib_add_procs] could not 
prepare openib device for use
[dc1s0b1a][[59067,1],0][btl_openib.c:1186:mca_btl_openib_get_ep] could not 
prepare openib device for use
[dc1s0b1a][[59067,1],0][connect/btl_openib_connect_udcm.c:1522:udcm_find_endpoint]
 could not find endpoint with port: 1, lid: 69, msg_type: 100


On Thu, 19 Oct 2017, Mark Dixon wrote:


Thanks Ralph, will do.

Cheers,

Mark

On Wed, 18 Oct 2017, r...@open-mpi.org wrote:


 Put “oob=tcp” in your default MCA param file


 On Oct 18, 2017, at 9:00 AM, Mark Dixon  wrote:

 Hi,

 We're intermittently seeing messages (below) about failing to register
 memory with openmpi 2.0.2 on centos7 / Mellanox FDR Connect-X 3 and the
 vanilla IB stack as shipped by centos.

 We're not using any mlx4_core module tweaks at the moment. On earlier
 machines we used to set registered memory as per the FAQ, but neither
 log_num_mtt nor num_mtt seem to exist these days (according to
 /sys/module/mlx4_*/parameters/*), which makes it somewhat difficult to
 follow the FAQ.

 The output of 'ulimit -l' shows as unlimited for every rank.

 Does anyone have any advice, please?

 Thanks,

 Mark

 -
 Failed to register memory region (MR):

 Hostname: dc1s0b1c
 Address:  ec5000
 Length:   20480
 Error:Cannot allocate memory
 --
 --
 Open MPI has detected that there are UD-capable Verbs devices on your
 system, but none of them were able to be setup properly.  This may
 indicate a problem on this system.

 You job will continue, but Open MPI will ignore the "ud" oob component
 in this run.
 ___
 users mailing list
 users@lists.open-mpi.org
 https://lists.open-mpi.org/mailman/listinfo/users


 ___
 users mailing list
 users@lists.open-mpi.org
 https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Invalid results with OpenMPI on Ubuntu Artful because of --enable-heterogeneous

2017-11-13 Thread Gilles Gouaillardet
Xavier,

thanks for the report, i will have a look at it.

is the bug triggered by MPI_ANY_SOURCE ?
/* e.g. does it work if you MPI_Irecv(..., myrank, ...) ? */


Unless ubuntu wants out of the box support between heterogeneous nodes
(for example x86_64 and ppc64),
there is little to no point in configuring Open MPI with the
--enable-heterogeneous option */


Cheers,

Gilles

On Mon, Nov 13, 2017 at 7:56 AM, Xavier Besseron  wrote:
> Dear all,
>
> I want to share with you the follow issue with the OpenMPI shipped with the
> latest Ubuntu Artful. It is OpenMPI 2.1.1 compiled with option
> --enable-heterogeneous.
>
> Looking at this issue https://github.com/open-mpi/ompi/issues/171, it
> appears that this option is broken and should not be used.
> This option is being used in Debian/Ubuntu since 2010
> (http://changelogs.ubuntu.com/changelogs/pool/universe/o/openmpi/openmpi_2.1.1-6/changelog)
> and is still used so far. Apparently, nobody complained so far.
>
> However, now I complain :-)
> I've found a simple example for which this option causes invalid results in
> OpenMPI.
>
>
> int A = 666, B = 42;
> MPI_Irecv(&A, 1, MPI_INT, MPI_ANY_SOURCE, tag, comm, &req);
> MPI_Send(&B, 1, MPI_INT, my_rank, tag, comm);
> MPI_Wait(&req, &status);
>
> # After that, when compiled with --enable-heterogeneous, we have A != B
>
> This happens with just a single process. The full example is in attachment
> (to be run with "mpirun -n 1 ./bug_openmpi_artful").
> I extracted and simplified the code from the Zoltan library with which I
> initially noticed the issue.
>
> I find it annoying that Ubuntu distributes a broken OpenMPI.
> I've also tested OpenMPI 2.1.1, 2.1.2 and 3.0.0 and using
> --enable-heterogeneous causes the bug systematically.
>
>
> Finally, my points/questions are:
>
> - To share with you this small example in case you want to debug it
>
> - What is the status of issue https://github.com/open-mpi/ompi/issues/171 ?
> Is this option still considered broken?
> If yes, I encourage you to remove it or mark as deprecated to avoid this
> kind of mistake in the future.
>
> - To get the feedback of OpenMPI developers on the use of this option, which
> might convince the Debian/Ubuntu maintainer to remove this flag.
> I have opened a bug on Ubuntu for it
> https://bugs.launchpad.net/ubuntu/+source/openmpi/+bug/1731938
>
>
> Thanks!
>
> Xavier
>
>
> --
> Dr Xavier BESSERON
> Research associate
> FSTC, University of Luxembourg
> Campus Belval, Office MNO E04 0415-040
> Phone: +352 46 66 44 5418
> http://luxdem.uni.lu/
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Invalid results with OpenMPI on Ubuntu Artful because of --enable-heterogeneous

2017-11-13 Thread Gilles Gouaillardet
Xavier,

i confirm there is a bug when using MPI_ANY_SOURCE with Open MPI
configure'd with --enable-heterogeneous

i made https://github.com/open-mpi/ompi/pull/4501 in order to fix
that, and will merge and backport once reviewed


Cheers,

Gilles

On Mon, Nov 13, 2017 at 8:46 AM, Gilles Gouaillardet
 wrote:
> Xavier,
>
> thanks for the report, i will have a look at it.
>
> is the bug triggered by MPI_ANY_SOURCE ?
> /* e.g. does it work if you MPI_Irecv(..., myrank, ...) ? */
>
>
> Unless ubuntu wants out of the box support between heterogeneous nodes
> (for example x86_64 and ppc64),
> there is little to no point in configuring Open MPI with the
> --enable-heterogeneous option */
>
>
> Cheers,
>
> Gilles
>
> On Mon, Nov 13, 2017 at 7:56 AM, Xavier Besseron  
> wrote:
>> Dear all,
>>
>> I want to share with you the follow issue with the OpenMPI shipped with the
>> latest Ubuntu Artful. It is OpenMPI 2.1.1 compiled with option
>> --enable-heterogeneous.
>>
>> Looking at this issue https://github.com/open-mpi/ompi/issues/171, it
>> appears that this option is broken and should not be used.
>> This option is being used in Debian/Ubuntu since 2010
>> (http://changelogs.ubuntu.com/changelogs/pool/universe/o/openmpi/openmpi_2.1.1-6/changelog)
>> and is still used so far. Apparently, nobody complained so far.
>>
>> However, now I complain :-)
>> I've found a simple example for which this option causes invalid results in
>> OpenMPI.
>>
>>
>> int A = 666, B = 42;
>> MPI_Irecv(&A, 1, MPI_INT, MPI_ANY_SOURCE, tag, comm, &req);
>> MPI_Send(&B, 1, MPI_INT, my_rank, tag, comm);
>> MPI_Wait(&req, &status);
>>
>> # After that, when compiled with --enable-heterogeneous, we have A != B
>>
>> This happens with just a single process. The full example is in attachment
>> (to be run with "mpirun -n 1 ./bug_openmpi_artful").
>> I extracted and simplified the code from the Zoltan library with which I
>> initially noticed the issue.
>>
>> I find it annoying that Ubuntu distributes a broken OpenMPI.
>> I've also tested OpenMPI 2.1.1, 2.1.2 and 3.0.0 and using
>> --enable-heterogeneous causes the bug systematically.
>>
>>
>> Finally, my points/questions are:
>>
>> - To share with you this small example in case you want to debug it
>>
>> - What is the status of issue https://github.com/open-mpi/ompi/issues/171 ?
>> Is this option still considered broken?
>> If yes, I encourage you to remove it or mark as deprecated to avoid this
>> kind of mistake in the future.
>>
>> - To get the feedback of OpenMPI developers on the use of this option, which
>> might convince the Debian/Ubuntu maintainer to remove this flag.
>> I have opened a bug on Ubuntu for it
>> https://bugs.launchpad.net/ubuntu/+source/openmpi/+bug/1731938
>>
>>
>> Thanks!
>>
>> Xavier
>>
>>
>> --
>> Dr Xavier BESSERON
>> Research associate
>> FSTC, University of Luxembourg
>> Campus Belval, Office MNO E04 0415-040
>> Phone: +352 46 66 44 5418
>> http://luxdem.uni.lu/
>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users