Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Åke Sandgren
Since i'm seeing similar Bus errors from both openmpi and other places
on our system I'm wondering, what hardware do you have?

CPU:s, interconnect etc.

On 03/23/2017 08:45 AM, Götz Waschk wrote:
> Hi Howard,
> 
> I have attached my config.log file for version 2.1.0. I have based it
> on the OpenHPC package. Unfortunately, it still crashes with disabling
> the vader btl with this command line:
> mpirun --mca btl "^vader" IMB-MPI1
> 
> 
> [pax11-10:44753] *** Process received signal ***
> [pax11-10:44753] Signal: Bus error (7)
> [pax11-10:44753] Signal code: Non-existant physical address (2)
> [pax11-10:44753] Failing at address: 0x2b3989e27a00

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Götz Waschk
Hi Åke,

I have E5-2697A CPUs and Mellanox ConnectX-3 FDR Infiniband. I'm using
EL7.3 as the operating system.

Regards, Götz Waschk

On Thu, Mar 23, 2017 at 9:28 AM, Åke Sandgren  wrote:
> Since i'm seeing similar Bus errors from both openmpi and other places
> on our system I'm wondering, what hardware do you have?
>
> CPU:s, interconnect etc.
>
> On 03/23/2017 08:45 AM, Götz Waschk wrote:
>> Hi Howard,
>>
>> I have attached my config.log file for version 2.1.0. I have based it
>> on the OpenHPC package. Unfortunately, it still crashes with disabling
>> the vader btl with this command line:
>> mpirun --mca btl "^vader" IMB-MPI1
>>
>>
>> [pax11-10:44753] *** Process received signal ***
>> [pax11-10:44753] Signal: Bus error (7)
>> [pax11-10:44753] Signal code: Non-existant physical address (2)
>> [pax11-10:44753] Failing at address: 0x2b3989e27a00
>
> --
> Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
> Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
> Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users



-- 
AL I:40: Do what thou wilt shall be the whole of the Law.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Götz Waschk
Hi Howard,

I had tried to send config.log of my 2.1.0 build, but I guess it was
too big for the list. I'm trying again with a compressed file.
I have based it on the OpenHPC package. Unfortunately, it still
crashes with disabling
the vader btl with this command line:
mpirun --mca btl "^vader" IMB-MPI1


[pax11-10:44753] *** Process received signal ***
[pax11-10:44753] Signal: Bus error (7)
[pax11-10:44753] Signal code: Non-existant physical address (2)
[pax11-10:44753] Failing at address: 0x2b3989e27a00
[pax11-10:44753] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2b3976f44370]
[pax11-10:44753] [ 1]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.so(+0x559a)[0x2b398545259a]
[pax11-10:44753] [ 2]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libopen-pal.so.20(opal_free_list_grow_st+0x1df)[0x2b39777bb78f]
[pax11-10:44753] [ 3]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.so(mca_btl_sm_sendi+0x272)[0x2b3985450562]
[pax11-10:44753] [ 4]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(+0x8a3f)[0x2b3985d78a3f]
[pax11-10:44753] [ 5]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x4a7)[0x2b3985d79ad7]
[pax11-10:44753] [ 6]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_coll_base_sendrecv_nonzero_actual+0x110)[0x2b3976cda620]
[pax11-10:44753] [ 7]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_coll_base_allreduce_intra_ring+0x860)[0x2b3976cdb8f0]
[pax11-10:44753] [ 8]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(PMPI_Allreduce+0x17b)[0x2b3976ca36ab]
[pax11-10:44753] [ 9] IMB-MPI1[0x40b2ff]
[pax11-10:44753] [10] IMB-MPI1[0x402646]
[pax11-10:44753] [11]
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b3977172b35]
[pax11-10:44753] [12] IMB-MPI1[0x401f79]
[pax11-10:44753] *** End of error message ***
[pax11-10:44752] *** Process received signal ***
[pax11-10:44752] Signal: Bus error (7)
[pax11-10:44752] Signal code: Non-existant physical address (2)
[pax11-10:44752] Failing at address: 0x2ab0d270d3e8
[pax11-10:44752] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2ab0bf7ec370]
[pax11-10:44752] [ 1]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_allocator_bucket.so(mca_allocator_bucket_alloc_align+0x89)[0x2ab0c2eed1c9]
[pax11-10:44752] [ 2]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmca_common_sm.so.20(+0x1495)[0x2ab0cde8d495]
[pax11-10:44752] [ 3]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libopen-pal.so.20(opal_free_list_grow_st+0x277)[0x2ab0c0063827]
[pax11-10:44752] [ 4]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.so(mca_btl_sm_sendi+0x272)[0x2ab0cdc87562]
[pax11-10:44752] [ 5]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(+0x8a3f)[0x2ab0ce630a3f]
[pax11-10:44752] [ 6]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x4a7)[0x2ab0ce631ad7]
[pax11-10:44752] [ 7]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_coll_base_sendrecv_nonzero_actual+0x110)[0x2ab0bf582620]
[pax11-10:44752] [ 8]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_coll_base_allreduce_intra_ring+0x860)[0x2ab0bf5838f0]
[pax11-10:44752] [ 9]
/opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(PMPI_Allreduce+0x17b)[0x2ab0bf54b6ab]
[pax11-10:44752] [10] IMB-MPI1[0x40b2ff]
[pax11-10:44752] [11] IMB-MPI1[0x402646]
[pax11-10:44752] [12]
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab0bfa1ab35]
[pax11-10:44752] [13] IMB-MPI1[0x401f79]
[pax11-10:44752] *** End of error message ***
--
mpirun noticed that process rank 340 with PID 44753 on node pax11-10


config.log.xz
Description: application/xz
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Åke Sandgren
E5-2697A which version? v4?

On 03/23/2017 09:53 AM, Götz Waschk wrote:
> Hi Åke,
> 
> I have E5-2697A CPUs and Mellanox ConnectX-3 FDR Infiniband. I'm using
> EL7.3 as the operating system.


-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Götz Waschk
On Thu, Mar 23, 2017 at 9:59 AM, Åke Sandgren  wrote:
> E5-2697A which version? v4?
Hi, yes, that one:
Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz

Regards, Götz
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Åke Sandgren
Ok, we have E5-2690v4's and Connect-IB.

On 03/23/2017 10:11 AM, Götz Waschk wrote:
> On Thu, Mar 23, 2017 at 9:59 AM, Åke Sandgren  
> wrote:
>> E5-2697A which version? v4?
> Hi, yes, that one:
> Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz
> 
> Regards, Götz
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Gilles Gouaillardet
Can you please try
mpirun --mca btl tcp,self ...
And if it works
mpirun --mca btl openib,self ...

Then can you try
mpirun --mca coll ^tuned --mca btl tcp,self ...

That will help figuring out whether the error is in the pml or the coll
framework/module

Cheers,

Gilles

On Thursday, March 23, 2017, Götz Waschk  wrote:

> Hi Howard,
>
> I have attached my config.log file for version 2.1.0. I have based it
> on the OpenHPC package. Unfortunately, it still crashes with disabling
> the vader btl with this command line:
> mpirun --mca btl "^vader" IMB-MPI1
>
>
> [pax11-10:44753] *** Process received signal ***
> [pax11-10:44753] Signal: Bus error (7)
> [pax11-10:44753] Signal code: Non-existant physical address (2)
> [pax11-10:44753] Failing at address: 0x2b3989e27a00
> [pax11-10:44753] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2b3976f44370]
> [pax11-10:44753] [ 1]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.
> so(+0x559a)[0x2b398545259a]
> [pax11-10:44753] [ 2]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libopen-pal.so.20(
> opal_free_list_grow_st+0x1df)[0x2b39777bb78f]
> [pax11-10:44753] [ 3]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.
> so(mca_btl_sm_sendi+0x272)[0x2b3985450562]
> [pax11-10:44753] [ 4]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.
> so(+0x8a3f)[0x2b3985d78a3f]
> [pax11-10:44753] [ 5]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.
> so(mca_pml_ob1_send+0x4a7)[0x2b3985d79ad7]
> [pax11-10:44753] [ 6]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_
> coll_base_sendrecv_nonzero_actual+0x110)[0x2b3976cda620]
> [pax11-10:44753] [ 7]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_
> coll_base_allreduce_intra_ring+0x860)[0x2b3976cdb8f0]
> [pax11-10:44753] [ 8]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(PMPI_
> Allreduce+0x17b)[0x2b3976ca36ab]
> [pax11-10:44753] [ 9] IMB-MPI1[0x40b2ff]
> [pax11-10:44753] [10] IMB-MPI1[0x402646]
> [pax11-10:44753] [11]
> /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b3977172b35]
> [pax11-10:44753] [12] IMB-MPI1[0x401f79]
> [pax11-10:44753] *** End of error message ***
> [pax11-10:44752] *** Process received signal ***
> [pax11-10:44752] Signal: Bus error (7)
> [pax11-10:44752] Signal code: Non-existant physical address (2)
> [pax11-10:44752] Failing at address: 0x2ab0d270d3e8
> [pax11-10:44752] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2ab0bf7ec370]
> [pax11-10:44752] [ 1]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_
> allocator_bucket.so(mca_allocator_bucket_alloc_align+0x89)[0x2ab0c2eed1c9]
> [pax11-10:44752] [ 2]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmca_common_sm.so.
> 20(+0x1495)[0x2ab0cde8d495]
> [pax11-10:44752] [ 3]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libopen-pal.so.20(
> opal_free_list_grow_st+0x277)[0x2ab0c0063827]
> [pax11-10:44752] [ 4]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.
> so(mca_btl_sm_sendi+0x272)[0x2ab0cdc87562]
> [pax11-10:44752] [ 5]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.
> so(+0x8a3f)[0x2ab0ce630a3f]
> [pax11-10:44752] [ 6]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.
> so(mca_pml_ob1_send+0x4a7)[0x2ab0ce631ad7]
> [pax11-10:44752] [ 7]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_
> coll_base_sendrecv_nonzero_actual+0x110)[0x2ab0bf582620]
> [pax11-10:44752] [ 8]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_
> coll_base_allreduce_intra_ring+0x860)[0x2ab0bf5838f0]
> [pax11-10:44752] [ 9]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(PMPI_
> Allreduce+0x17b)[0x2ab0bf54b6ab]
> [pax11-10:44752] [10] IMB-MPI1[0x40b2ff]
> [pax11-10:44752] [11] IMB-MPI1[0x402646]
> [pax11-10:44752] [12]
> /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab0bfa1ab35]
> [pax11-10:44752] [13] IMB-MPI1[0x401f79]
> [pax11-10:44752] *** End of error message ***
> --
> mpirun noticed that process rank 340 with PID 44753 on node pax11-10
> exited on signal 7 (Bus error).
> --
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Erors and segmentation faults when installing openmpi-2.1

2017-03-23 Thread Dimitrova, Maria

Hello,


I am setting up a freshly installed Ubuntu 16.04 computer to do some parallel 
programming and I need the MPI compilers for C and Fortran. Using the provided 
tar archive in the download page produces a series of errors (a very long list 
because I tried running make all many times but I can attach it if necessary). 
Instead, I tried just "sudo apt install libmpich-dev" but again a segmentation 
fault came up:

Get:4 http://www.nic.funet.fi/pub/mirrors/archive.ubuntu.com xenial/universe 
amd64 mpich amd64 3.2-6build1 [197 kB]
Fetched 2 427 kB in 0s (4 238 kB/s)
Selecting previously unselected package hwloc-nox.
(Reading database ... 212998 files and directories currently installed.)
Preparing to unpack .../hwloc-nox_1.11.2-3_amd64.deb ...
Unpacking hwloc-nox (1.11.2-3) ...
Selecting previously unselected package libmpich12:amd64.
Preparing to unpack .../libmpich12_3.2-6build1_amd64.deb ...
Unpacking libmpich12:amd64 (3.2-6build1) ...
Selecting previously unselected package libmpich-dev.
Preparing to unpack .../libmpich-dev_3.2-6build1_amd64.deb ...
Unpacking libmpich-dev (3.2-6build1) ...
Selecting previously unselected package mpich.
Preparing to unpack .../mpich_3.2-6build1_amd64.deb ...
Unpacking mpich (3.2-6build1) ...
Processing triggers for man-db (2.7.5-1) ...
Processing triggers for libc-bin (2.23-0ubuntu7) ...
Setting up openmpi (2.1.0-2) ...
chown: invalid user: ‘jsquyres:named’
chown: invalid user: ‘jsquyres:named’
dpkg: error processing package openmpi (--configure):
 subprocess installed post-installation script returned error exit status 1
Setting up hwloc-nox (1.11.2-3) ...
Setting up libmpich12:amd64 (3.2-6build1) ...
Setting up libmpich-dev (3.2-6build1) ...
update-alternatives: using /usr/include/mpich to provide /usr/include/mpi (mpi) 
in auto mode
Setting up mpich (3.2-6build1) ...
Processing triggers for libc-bin (2.23-0ubuntu7) ...
Errors were encountered while processing:
 openmpi


I suspect that the problem does not originate specifically from Open MPI but so 
far other packages that I installed have been functioning properly. Could you 
suggest some solution? Thank you in advance.



Best regards,

Maria
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Erors and segmentation faults when installing openmpi-2.1

2017-03-23 Thread Jeff Squyres (jsquyres)
Note that Open MPI and MPICH are different implementations of the MPI 
specification.

If you are mixing an Open MPI tarball install with an MPICH apt install, things 
will likely go downhill from there.

You need to ensure to use Open MPI *or* MPICH.


> On Mar 23, 2017, at 5:38 AM, Dimitrova, Maria  
> wrote:
> 
> 
> Hello,
> 
> I am setting up a freshly installed Ubuntu 16.04 computer to do some parallel 
> programming and I need the MPI compilers for C and Fortran. Using the 
> provided tar archive in the download page produces a series of errors (a very 
> long list because I tried running make all many times but I can attach it if 
> necessary). Instead, I tried just "sudo apt install libmpich-dev" but again a 
> segmentation fault came up: 
> 
> Get:4 http://www.nic.funet.fi/pub/mirrors/archive.ubuntu.com xenial/universe 
> amd64 mpich amd64 3.2-6build1 [197 kB]
> Fetched 2 427 kB in 0s (4 238 kB/s)
> Selecting previously unselected package hwloc-nox.
> (Reading database ... 212998 files and directories currently installed.)
> Preparing to unpack .../hwloc-nox_1.11.2-3_amd64.deb ...
> Unpacking hwloc-nox (1.11.2-3) ...
> Selecting previously unselected package libmpich12:amd64.
> Preparing to unpack .../libmpich12_3.2-6build1_amd64.deb ...
> Unpacking libmpich12:amd64 (3.2-6build1) ...
> Selecting previously unselected package libmpich-dev.
> Preparing to unpack .../libmpich-dev_3.2-6build1_amd64.deb ...
> Unpacking libmpich-dev (3.2-6build1) ...
> Selecting previously unselected package mpich.
> Preparing to unpack .../mpich_3.2-6build1_amd64.deb ...
> Unpacking mpich (3.2-6build1) ...
> Processing triggers for man-db (2.7.5-1) ...
> Processing triggers for libc-bin (2.23-0ubuntu7) ...
> Setting up openmpi (2.1.0-2) ...
> chown: invalid user: ‘jsquyres:named’
> chown: invalid user: ‘jsquyres:named’
> dpkg: error processing package openmpi (--configure):
>  subprocess installed post-installation script returned error exit status 1
> Setting up hwloc-nox (1.11.2-3) ...
> Setting up libmpich12:amd64 (3.2-6build1) ...
> Setting up libmpich-dev (3.2-6build1) ...
> update-alternatives: using /usr/include/mpich to provide /usr/include/mpi 
> (mpi) in auto mode
> Setting up mpich (3.2-6build1) ...
> Processing triggers for libc-bin (2.23-0ubuntu7) ...
> Errors were encountered while processing:
>  openmpi
> 
> 
> I suspect that the problem does not originate specifically from Open MPI but 
> so far other packages that I installed have been functioning properly. Could 
> you suggest some solution? Thank you in advance. 
> 
> 
> Best regards,
> Maria
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] a question about MPI dynamic process manage

2017-03-23 Thread Jeff Squyres (jsquyres)
It's likely a lot more efficient to MPI_COMM_SPAWN *all* of your children at 
once, and then subdivide up the resulting newcomm communicator as desired.

It is *possible* to have a series MPI_COMM_SPAWN calls that spawn a single 
child process, and then later join all of those children into a single 
communicator, but it is somewhat tricky and likely not worth it (i.e., you'll 
save a lot of code complexity if you can spawn all the children at once).



> On Mar 23, 2017, at 12:23 AM, gzzh...@buaa.edu.cn wrote:
> 
> Hi team:
> I have a question about MPI dynamic process manage, I hope you can 
> provide some help.
> First of all, the MPI program running on multiple nodes, the group with 
> MPI_COMM_WORLD
> was splitted into some subgroups by nodes and sub-communicators  were created 
> respectively 
> so that MPI processes in one node can communicate with each other through 
> these sub-communicators.
> Then using MPI_Comm_spawn("./child", NULL, 1, hostinfo, 0, sub-communicator, 
> &newcomm, &errs) 
> to spawn one child process in each node . Children processes were expected to 
> form a group 
> and further to create an intra-communicator so that using it some message 
> passing can be done between these children processes.
> The question is how can I achieve that? Or only have to use MPI_Comm_accept 
> &MPI_Comm_connect to establish
> a connection?
> 
> best regards!
> 
> -
> Eric
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Götz Waschk
Hi Gilles,

I'm currently testing and here are some preliminary results:

On Thu, Mar 23, 2017 at 10:33 AM, Gilles Gouaillardet
 wrote:
> Can you please try
> mpirun --mca btl tcp,self ...

this failed to produce the program output, there were lots of errors like this:
[pax11-00][[54124,1],31][btl_tcp_endpoint.c:803:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.225.202 failed: Connection timed out (110)

I had to terminate the job.

That's why I have added the option --mca btl_tcp_if_exclude ib0 . In
this case, the program started to produce output, but started hanging
early on with this error:
 
[pax11-00][[61232,1],31][btl_tcp_endpoint.c:803:mca_btl_tcp_endpoint_complete_connect]
connect() to 127.0.0.1 failed: Connection refused (111)
[pax11-01][[61232,1],63][btl_tcp_endpoint.c:649:mca_btl_tcp_endpoint_recv_connect_ack]
received unexpected process identifier [[61232,1],33]

I have aborted that job as well.


> And if it works
> mpirun --mca btl openib,self ...
This is running fine so far but will take some more time.

Regards, Götz
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Help with Open MPI 2.1.0 and PGI 16.10: Configure and C++

2017-03-23 Thread Reuti
Hi,

Am 22.03.2017 um 20:12 schrieb Matt Thompson:

> […]
> 
> Ah. PGI 16.9+ now use pgc++ to do C++ compiling, not pgcpp. So, I hacked 
> configure so that references to pgCC (nonexistent on macOS) are gone and all 
> pgcpp became pgc++, but:

This is not unique to macOS. pgCC used STLPort STL and is no longer included 
with their compiler suite, pgc++ now uses a GCC compatible library format and 
replaces the former one on Linux too.

There I get, ignoring the gnu output during `configure` and compiling anyway:

$ mpic++ --version

pgc++ 16.10-0 64-bit target on x86-64 Linux -tp bulldozer
The Portland Group - PGI Compilers and Tools
Copyright (c) 2016, NVIDIA CORPORATION.  All rights reserved.

Maybe some options for the `mpic++` wrapper were just set in a wrong way?

===

Nevertheless: did you see the error on the Mac at the end of the `configure` 
step too, or was it gone after the hints on the discussion's link you posted? 
As I face it there still about "libeevent".

-- Reuti


> 
> *** C++ compiler and preprocessor
> checking whether we are using the GNU C++ compiler... yes
> checking whether pgc++ accepts -g... yes
> checking dependency style of pgc++... none
> checking how to run the C++ preprocessor... pgc++ -E
> checking for the C++ compiler vendor... gnu
> 
> Well, at this point, I think I'm stopping until I get help. Will this chunk 
> of configure always return gnu for PGI? I know the C part returns 'portland 
> group':
> 
> *** C compiler and preprocessor
> checking for gcc... (cached) pgcc
> checking whether we are using the GNU C compiler... (cached) no
> checking whether pgcc accepts -g... (cached) yes
> checking for pgcc option to accept ISO C89... (cached) none needed
> checking whether pgcc understands -c and -o together... (cached) yes
> checking for pgcc option to accept ISO C99... none needed
> checking for the C compiler vendor... portland group
> 
> so I thought the C++ section would as well. I also tried passing in 
> --enable-mpi-cxx, but that did nothing.
> 
> Is this just a red herring? My real concern is with pgfortran/mpifort, but I 
> thought I'd start with this. If this is okay, I'll move on and detail the 
> fortran issues I'm having.
> 
> Matt
> --
> Matt Thompson
> Man Among Men
> Fulcrum of History
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Help with Open MPI 2.1.0 and PGI 16.10: Configure and C++

2017-03-23 Thread Gilles Gouaillardet
Matt,

a C++ compiler is required to configure Open MPI.
That being said, C++ compiler is only used if you build the C++ bindings
(That were removed from MPI-3)
And unless you plan to use the mpic++ wrapper (with or without the C++
bindings),
a valid C++ compiler is not required at all.
/* configure still requires one, and that could be improved */

My point is you should not worry too much about configure messages related
to C++,
and you should instead focus on the Fortran issue.

Cheers,

Gilles

On Thursday, March 23, 2017, Matt Thompson  wrote:

> All, I'm hoping one of you knows what I might be doing wrong here.  I'm
> trying to use Open MPI 2.1.0 for PGI 16.10 (Community Edition) on macOS.
> Now, I built it a la:
>
> http://www.pgroup.com/userforum/viewtopic.php?p=21105#21105
>
> and found that it built, but the resulting mpifort, etc were just not
> good. Couldn't even do Hello World.
>
> So, I thought I'd start from the beginning. I tried running:
>
> configure --disable-wrapper-rpath CC=pgcc CXX=pgc++ FC=pgfortran
> --prefix=/Users/mathomp4/installed/Compiler/pgi-16.10/openmpi/2.1.0
> but when I did I saw this:
>
> *** C++ compiler and preprocessor
> checking whether we are using the GNU C++ compiler... yes
> checking whether pgc++ accepts -g... yes
> checking dependency style of pgc++... none
> checking how to run the C++ preprocessor... pgc++ -E
> checking for the C++ compiler vendor... gnu
>
> Well, that's not the right vendor. So, I took a look at configure and I
> saw that at least some detection for PGI was a la:
>
>   pgCC* | pgcpp*)
> # Portland Group C++ compiler
> case `$CC -V` in
> *pgCC\ [1-5].* | *pgcpp\ [1-5].*)
>
>   pgCC* | pgcpp*)
> # Portland Group C++ compiler
> lt_prog_compiler_wl_CXX='-Wl,'
> lt_prog_compiler_pic_CXX='-fpic'
> lt_prog_compiler_static_CXX='-Bstatic'
> ;;
>
> Ah. PGI 16.9+ now use pgc++ to do C++ compiling, not pgcpp. So, I hacked
> configure so that references to pgCC (nonexistent on macOS) are gone and
> all pgcpp became pgc++, but:
>
> *** C++ compiler and preprocessor
> checking whether we are using the GNU C++ compiler... yes
> checking whether pgc++ accepts -g... yes
> checking dependency style of pgc++... none
> checking how to run the C++ preprocessor... pgc++ -E
> checking for the C++ compiler vendor... gnu
>
> Well, at this point, I think I'm stopping until I get help. Will this
> chunk of configure always return gnu for PGI? I know the C part returns
> 'portland group':
>
> *** C compiler and preprocessor
> checking for gcc... (cached) pgcc
> checking whether we are using the GNU C compiler... (cached) no
> checking whether pgcc accepts -g... (cached) yes
> checking for pgcc option to accept ISO C89... (cached) none needed
> checking whether pgcc understands -c and -o together... (cached) yes
> checking for pgcc option to accept ISO C99... none needed
> checking for the C compiler vendor... portland group
>
> so I thought the C++ section would as well. I also tried passing in
> --enable-mpi-cxx, but that did nothing.
>
> Is this just a red herring? My real concern is with pgfortran/mpifort, but
> I thought I'd start with this. If this is okay, I'll move on and detail the
> fortran issues I'm having.
>
> Matt
> --
> Matt Thompson
>
> Man Among Men
> Fulcrum of History
>
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Götz Waschk
Hi Gilles,

On Thu, Mar 23, 2017 at 10:33 AM, Gilles Gouaillardet
 wrote:
> mpirun --mca btl openib,self ...

Looks like this didn't finish, I had to terminate the job during the
Gather with 32 processes step.

> Then can you try
> mpirun --mca coll ^tuned --mca btl tcp,self ...
As mentioned, this didn't produce any program output, just the mentioned errors.

I have also tried mpirun --mca coll ^tuned --mca btl tcp,openib , this
finished fine, but was quite slow. I am currently testing with mpirun
--mca coll ^tuned

Regards, Götz
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Götz Waschk
On Thu, Mar 23, 2017 at 2:37 PM, Götz Waschk  wrote:
> I have also tried mpirun --mca coll ^tuned --mca btl tcp,openib , this
> finished fine, but was quite slow. I am currently testing with mpirun
> --mca coll ^tuned

This one ran also fine.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] more migrating to MPI_F08

2017-03-23 Thread Tom Rosmond

Hello,

Attached is a simple MPI program demonstrating a problem I have 
encountered with 'MPI_Type_create_hindexed' when compiling with the 
'mpi_f08' module.  There are 2  blocks of code that are only different 
in how the length and displacement arrays are declared.  I get


indx.f90(50): error #6285: There is no matching specific subroutine for 
this generic subroutine call.   [MPI_TYPE_CREATE_HINDEXED]

  call mpi_type_create_hindexed(lenidx,ijlena_2d,ijdispl_2d, &

for a case where the length and displacement arrays are 2-dimensional.  
As far as I can find, there is nothing in the MPI-3.1 standard that 
requires that these arrays be 1-dimensional. Am I missing something, or 
is this a OPEN-MPI bug?  I have been running successful programs with 
multi-dimensional versions of these arrays for years when compiling with 
'mpif.h'.


T. Rosmond


  program hindx
 
  use mpi_f08

  implicit none
 
  integer i,lenidx,ierr,irank,nproc,lint,ibyte

  integer, dimension(:), allocatable :: ijlena_1d
  integer, dimension(:,:), allocatable :: ijlena_2d

  integer(kind=MPI_ADDRESS_KIND), dimension(:), allocatable :: ijdispl_1d
  integer(kind=MPI_ADDRESS_KIND), dimension(:,:), allocatable :: ijdispl_2d

  type(mpi_status) :: status

  type(mpi_datatype) :: ij_vector_type_1d
  type(mpi_datatype) :: ij_vector_type_2d
 
  call MPI_Init(ierr)
  call MPI_Comm_rank(MPI_COMM_WORLD,irank,ierr)
  call MPI_Comm_size(MPI_COMM_WORLD,nproc,ierr)

  call mpi_sizeof(i,ibyte,ierr)

! 1-D case (ocmpile success)

  lenidx= 10
  allocate(ijlena_1d(lenidx))
  allocate(ijdispl_1d(lenidx))

  do i=1,lenidx
  ijlena_1d(i)= 1
  ijdispl_1d(i)= (i-1)*ibyte
  enddo

  call mpi_type_create_hindexed(lenidx,ijlena_1d,ijdispl_1d, &
mpi_real,ij_vector_type_1d,ierr)
  call mpi_type_commit(ij_vector_type_1d,ierr)

! 2-D case  (ocmpile failure)

  lenidx= 10
  allocate(ijlena_2d(lenidx,1))
  allocate(ijdispl_2d(lenidx,1))

  do i=1,lenidx
  ijlena_2d(i,1)= 1
  ijdispl_2d(i,1)= (i-1)*ibyte
  enddo

  call mpi_type_create_hindexed(lenidx,ijlena_2d,ijdispl_2d, &
mpi_real,ij_vector_type_2d,ierr)
  call mpi_type_commit(ij_vector_type_2d,ierr)
 
  call mpi_finalize(ierr)  
  end  
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] openmpi installation error

2017-03-23 Thread Vinay Mittal
I need mpirun to run a genome assembler.

Linux installation of openmpi-2.1.0 stops during make all saying:

"Perl 5.006 required--this is only version 5.00503, stopped at
/usr/share/perl5/vars.pm line 3."

Is it really that Perl specific? I am following the standard installation
path without root access.

Thanks.


Vinay K. Mittal
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] openmpi installation error

2017-03-23 Thread Jeff Squyres (jsquyres)
That's a pretty weird error.  We don't require any specific version of perl 
that I'm aware of.  Are you sure that it's Open MPI's installer that is kicking 
out the error?

Can you send all the information listed here:

https://www.open-mpi.org/community/help/


> On Mar 23, 2017, at 1:39 PM, Vinay Mittal  wrote:
> 
> I need mpirun to run a genome assembler.
> 
> Linux installation of openmpi-2.1.0 stops during make all saying:
> 
> "Perl 5.006 required--this is only version 5.00503, stopped at 
> /usr/share/perl5/vars.pm line 3."
> 
> Is it really that Perl specific? I am following the standard installation 
> path without root access.
> 
> Thanks.
> 
> 
> Vinay K. Mittal
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] more migrating to MPI_F08

2017-03-23 Thread Jeff Squyres (jsquyres)
Actually, MPI-3.1 p90:37-45 explicitly says that the array_of_blocklengths and 
array_of_displacements arrays must be both 1D and of length count.

If my Fortran memory serves me correctly, I think you can pass in an array 
subsection if your blocklengths/displacements are part of a larger array (the 
compiler should be smart enough to do that without forcing a copy, when 
possible).

FWIW: one of the design goals of the mpi_f08 module was to bring to light 
previous practices that people used in their applications with mpif.h that, 
although they worked, probably were not really correct Fortran / worked kinda 
by accident.  Recall that F08 is very, very strongly typed (even more so than 
C++).  Meaning: being picky about 1D-and-a-specific-legnth is a *feature*!  
(yeah, it's kind of a PITA, but it really does help prevent bugs)



> On Mar 23, 2017, at 1:06 PM, Tom Rosmond  wrote:
> 
> Hello,
> 
> Attached is a simple MPI program demonstrating a problem I have encountered 
> with 'MPI_Type_create_hindexed' when compiling with the 'mpi_f08' module.  
> There are 2  blocks of code that are only different in how the length and 
> displacement arrays are declared.  I get
> 
> indx.f90(50): error #6285: There is no matching specific subroutine for this 
> generic subroutine call.   [MPI_TYPE_CREATE_HINDEXED]
>  call mpi_type_create_hindexed(lenidx,ijlena_2d,ijdispl_2d, &
> 
> for a case where the length and displacement arrays are 2-dimensional.  As 
> far as I can find, there is nothing in the MPI-3.1 standard that requires 
> that these arrays be 1-dimensional. Am I missing something, or is this a 
> OPEN-MPI bug?  I have been running successful programs with multi-dimensional 
> versions of these arrays for years when compiling with 'mpif.h'.
> 
> T. Rosmond
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] openmpi installation error

2017-03-23 Thread Renato Golin
On 23 March 2017 at 17:39, Vinay Mittal  wrote:
> I need mpirun to run a genome assembler.
>
> Linux installation of openmpi-2.1.0 stops during make all saying:
>
> "Perl 5.006 required--this is only version 5.00503, stopped at
> /usr/share/perl5/vars.pm line 3."

This looks like Perl's own verification process (inside vars.pm), and
a broken one at that.

Perl 5.005 is really old (98) and it's the minimal Perl you should
possibly be using. A quick look at Perl History [1] shows that there
is no such thing as Perl 5.006!

I suggest you clean up and upgrade your Perl installation before
trying to install OpenMPI again. :)

cheers,
--renato

[1] http://perldoc.perl.org/perlhist.html
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] more migrating to MPI_F08

2017-03-23 Thread Tom Rosmond

Thanks, Jeff,

I had stared at those lines many times and it didn't register that 
(count) was explicitly specifying only 1-D is allowed.  Pretty cryptic. 
I wonder how many other fortran programmers will be bit by this?


T. Rosmond



On 03/23/2017 10:50 AM, Jeff Squyres (jsquyres) wrote:

Actually, MPI-3.1 p90:37-45 explicitly says that the array_of_blocklengths and 
array_of_displacements arrays must be both 1D and of length count.

If my Fortran memory serves me correctly, I think you can pass in an array 
subsection if your blocklengths/displacements are part of a larger array (the 
compiler should be smart enough to do that without forcing a copy, when 
possible).

FWIW: one of the design goals of the mpi_f08 module was to bring to light 
previous practices that people used in their applications with mpif.h that, 
although they worked, probably were not really correct Fortran / worked kinda 
by accident.  Recall that F08 is very, very strongly typed (even more so than 
C++).  Meaning: being picky about 1D-and-a-specific-legnth is a *feature*!  
(yeah, it's kind of a PITA, but it really does help prevent bugs)




On Mar 23, 2017, at 1:06 PM, Tom Rosmond  wrote:

Hello,

Attached is a simple MPI program demonstrating a problem I have encountered 
with 'MPI_Type_create_hindexed' when compiling with the 'mpi_f08' module.  
There are 2  blocks of code that are only different in how the length and 
displacement arrays are declared.  I get

indx.f90(50): error #6285: There is no matching specific subroutine for this 
generic subroutine call.   [MPI_TYPE_CREATE_HINDEXED]
  call mpi_type_create_hindexed(lenidx,ijlena_2d,ijdispl_2d, &

for a case where the length and displacement arrays are 2-dimensional.  As far 
as I can find, there is nothing in the MPI-3.1 standard that requires that 
these arrays be 1-dimensional. Am I missing something, or is this a OPEN-MPI 
bug?  I have been running successful programs with multi-dimensional versions 
of these arrays for years when compiling with 'mpif.h'.

T. Rosmond


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users




___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] more migrating to MPI_F08

2017-03-23 Thread Jeff Squyres (jsquyres)
On Mar 23, 2017, at 3:20 PM, Tom Rosmond  wrote:
> 
> I had stared at those lines many times and it didn't register that (count) 
> was explicitly specifying only 1-D is allowed.  Pretty cryptic. I wonder how 
> many other fortran programmers will be bit by this?

My understanding is that that is standard Fortran notation (e.g., it's how we 
declare them in the mpi_f08 module in Open MPI).

Yes, it might be a bit cryptic for the uninitiated, but -- for better or for 
worse -- that's how the language is defined.

-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users