Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes
Since i'm seeing similar Bus errors from both openmpi and other places on our system I'm wondering, what hardware do you have? CPU:s, interconnect etc. On 03/23/2017 08:45 AM, Götz Waschk wrote: > Hi Howard, > > I have attached my config.log file for version 2.1.0. I have based it > on the OpenHPC package. Unfortunately, it still crashes with disabling > the vader btl with this command line: > mpirun --mca btl "^vader" IMB-MPI1 > > > [pax11-10:44753] *** Process received signal *** > [pax11-10:44753] Signal: Bus error (7) > [pax11-10:44753] Signal code: Non-existant physical address (2) > [pax11-10:44753] Failing at address: 0x2b3989e27a00 -- Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden Internet: a...@hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90-580 14 Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes
Hi Åke, I have E5-2697A CPUs and Mellanox ConnectX-3 FDR Infiniband. I'm using EL7.3 as the operating system. Regards, Götz Waschk On Thu, Mar 23, 2017 at 9:28 AM, Åke Sandgren wrote: > Since i'm seeing similar Bus errors from both openmpi and other places > on our system I'm wondering, what hardware do you have? > > CPU:s, interconnect etc. > > On 03/23/2017 08:45 AM, Götz Waschk wrote: >> Hi Howard, >> >> I have attached my config.log file for version 2.1.0. I have based it >> on the OpenHPC package. Unfortunately, it still crashes with disabling >> the vader btl with this command line: >> mpirun --mca btl "^vader" IMB-MPI1 >> >> >> [pax11-10:44753] *** Process received signal *** >> [pax11-10:44753] Signal: Bus error (7) >> [pax11-10:44753] Signal code: Non-existant physical address (2) >> [pax11-10:44753] Failing at address: 0x2b3989e27a00 > > -- > Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden > Internet: a...@hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90-580 14 > Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users -- AL I:40: Do what thou wilt shall be the whole of the Law. ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes
Hi Howard, I had tried to send config.log of my 2.1.0 build, but I guess it was too big for the list. I'm trying again with a compressed file. I have based it on the OpenHPC package. Unfortunately, it still crashes with disabling the vader btl with this command line: mpirun --mca btl "^vader" IMB-MPI1 [pax11-10:44753] *** Process received signal *** [pax11-10:44753] Signal: Bus error (7) [pax11-10:44753] Signal code: Non-existant physical address (2) [pax11-10:44753] Failing at address: 0x2b3989e27a00 [pax11-10:44753] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2b3976f44370] [pax11-10:44753] [ 1] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.so(+0x559a)[0x2b398545259a] [pax11-10:44753] [ 2] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libopen-pal.so.20(opal_free_list_grow_st+0x1df)[0x2b39777bb78f] [pax11-10:44753] [ 3] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.so(mca_btl_sm_sendi+0x272)[0x2b3985450562] [pax11-10:44753] [ 4] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(+0x8a3f)[0x2b3985d78a3f] [pax11-10:44753] [ 5] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x4a7)[0x2b3985d79ad7] [pax11-10:44753] [ 6] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_coll_base_sendrecv_nonzero_actual+0x110)[0x2b3976cda620] [pax11-10:44753] [ 7] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_coll_base_allreduce_intra_ring+0x860)[0x2b3976cdb8f0] [pax11-10:44753] [ 8] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(PMPI_Allreduce+0x17b)[0x2b3976ca36ab] [pax11-10:44753] [ 9] IMB-MPI1[0x40b2ff] [pax11-10:44753] [10] IMB-MPI1[0x402646] [pax11-10:44753] [11] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b3977172b35] [pax11-10:44753] [12] IMB-MPI1[0x401f79] [pax11-10:44753] *** End of error message *** [pax11-10:44752] *** Process received signal *** [pax11-10:44752] Signal: Bus error (7) [pax11-10:44752] Signal code: Non-existant physical address (2) [pax11-10:44752] Failing at address: 0x2ab0d270d3e8 [pax11-10:44752] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2ab0bf7ec370] [pax11-10:44752] [ 1] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_allocator_bucket.so(mca_allocator_bucket_alloc_align+0x89)[0x2ab0c2eed1c9] [pax11-10:44752] [ 2] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmca_common_sm.so.20(+0x1495)[0x2ab0cde8d495] [pax11-10:44752] [ 3] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libopen-pal.so.20(opal_free_list_grow_st+0x277)[0x2ab0c0063827] [pax11-10:44752] [ 4] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.so(mca_btl_sm_sendi+0x272)[0x2ab0cdc87562] [pax11-10:44752] [ 5] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(+0x8a3f)[0x2ab0ce630a3f] [pax11-10:44752] [ 6] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x4a7)[0x2ab0ce631ad7] [pax11-10:44752] [ 7] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_coll_base_sendrecv_nonzero_actual+0x110)[0x2ab0bf582620] [pax11-10:44752] [ 8] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_coll_base_allreduce_intra_ring+0x860)[0x2ab0bf5838f0] [pax11-10:44752] [ 9] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(PMPI_Allreduce+0x17b)[0x2ab0bf54b6ab] [pax11-10:44752] [10] IMB-MPI1[0x40b2ff] [pax11-10:44752] [11] IMB-MPI1[0x402646] [pax11-10:44752] [12] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab0bfa1ab35] [pax11-10:44752] [13] IMB-MPI1[0x401f79] [pax11-10:44752] *** End of error message *** -- mpirun noticed that process rank 340 with PID 44753 on node pax11-10 config.log.xz Description: application/xz ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes
E5-2697A which version? v4? On 03/23/2017 09:53 AM, Götz Waschk wrote: > Hi Åke, > > I have E5-2697A CPUs and Mellanox ConnectX-3 FDR Infiniband. I'm using > EL7.3 as the operating system. -- Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden Internet: a...@hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90-580 14 Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes
On Thu, Mar 23, 2017 at 9:59 AM, Åke Sandgren wrote: > E5-2697A which version? v4? Hi, yes, that one: Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz Regards, Götz ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes
Ok, we have E5-2690v4's and Connect-IB. On 03/23/2017 10:11 AM, Götz Waschk wrote: > On Thu, Mar 23, 2017 at 9:59 AM, Åke Sandgren > wrote: >> E5-2697A which version? v4? > Hi, yes, that one: > Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz > > Regards, Götz > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > -- Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden Internet: a...@hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90-580 14 Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes
Can you please try mpirun --mca btl tcp,self ... And if it works mpirun --mca btl openib,self ... Then can you try mpirun --mca coll ^tuned --mca btl tcp,self ... That will help figuring out whether the error is in the pml or the coll framework/module Cheers, Gilles On Thursday, March 23, 2017, Götz Waschk wrote: > Hi Howard, > > I have attached my config.log file for version 2.1.0. I have based it > on the OpenHPC package. Unfortunately, it still crashes with disabling > the vader btl with this command line: > mpirun --mca btl "^vader" IMB-MPI1 > > > [pax11-10:44753] *** Process received signal *** > [pax11-10:44753] Signal: Bus error (7) > [pax11-10:44753] Signal code: Non-existant physical address (2) > [pax11-10:44753] Failing at address: 0x2b3989e27a00 > [pax11-10:44753] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2b3976f44370] > [pax11-10:44753] [ 1] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm. > so(+0x559a)[0x2b398545259a] > [pax11-10:44753] [ 2] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libopen-pal.so.20( > opal_free_list_grow_st+0x1df)[0x2b39777bb78f] > [pax11-10:44753] [ 3] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm. > so(mca_btl_sm_sendi+0x272)[0x2b3985450562] > [pax11-10:44753] [ 4] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1. > so(+0x8a3f)[0x2b3985d78a3f] > [pax11-10:44753] [ 5] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1. > so(mca_pml_ob1_send+0x4a7)[0x2b3985d79ad7] > [pax11-10:44753] [ 6] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_ > coll_base_sendrecv_nonzero_actual+0x110)[0x2b3976cda620] > [pax11-10:44753] [ 7] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_ > coll_base_allreduce_intra_ring+0x860)[0x2b3976cdb8f0] > [pax11-10:44753] [ 8] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(PMPI_ > Allreduce+0x17b)[0x2b3976ca36ab] > [pax11-10:44753] [ 9] IMB-MPI1[0x40b2ff] > [pax11-10:44753] [10] IMB-MPI1[0x402646] > [pax11-10:44753] [11] > /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b3977172b35] > [pax11-10:44753] [12] IMB-MPI1[0x401f79] > [pax11-10:44753] *** End of error message *** > [pax11-10:44752] *** Process received signal *** > [pax11-10:44752] Signal: Bus error (7) > [pax11-10:44752] Signal code: Non-existant physical address (2) > [pax11-10:44752] Failing at address: 0x2ab0d270d3e8 > [pax11-10:44752] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2ab0bf7ec370] > [pax11-10:44752] [ 1] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_ > allocator_bucket.so(mca_allocator_bucket_alloc_align+0x89)[0x2ab0c2eed1c9] > [pax11-10:44752] [ 2] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmca_common_sm.so. > 20(+0x1495)[0x2ab0cde8d495] > [pax11-10:44752] [ 3] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libopen-pal.so.20( > opal_free_list_grow_st+0x277)[0x2ab0c0063827] > [pax11-10:44752] [ 4] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm. > so(mca_btl_sm_sendi+0x272)[0x2ab0cdc87562] > [pax11-10:44752] [ 5] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1. > so(+0x8a3f)[0x2ab0ce630a3f] > [pax11-10:44752] [ 6] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1. > so(mca_pml_ob1_send+0x4a7)[0x2ab0ce631ad7] > [pax11-10:44752] [ 7] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_ > coll_base_sendrecv_nonzero_actual+0x110)[0x2ab0bf582620] > [pax11-10:44752] [ 8] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_ > coll_base_allreduce_intra_ring+0x860)[0x2ab0bf5838f0] > [pax11-10:44752] [ 9] > /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(PMPI_ > Allreduce+0x17b)[0x2ab0bf54b6ab] > [pax11-10:44752] [10] IMB-MPI1[0x40b2ff] > [pax11-10:44752] [11] IMB-MPI1[0x402646] > [pax11-10:44752] [12] > /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab0bfa1ab35] > [pax11-10:44752] [13] IMB-MPI1[0x401f79] > [pax11-10:44752] *** End of error message *** > -- > mpirun noticed that process rank 340 with PID 44753 on node pax11-10 > exited on signal 7 (Bus error). > -- > ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
[OMPI users] Erors and segmentation faults when installing openmpi-2.1
Hello, I am setting up a freshly installed Ubuntu 16.04 computer to do some parallel programming and I need the MPI compilers for C and Fortran. Using the provided tar archive in the download page produces a series of errors (a very long list because I tried running make all many times but I can attach it if necessary). Instead, I tried just "sudo apt install libmpich-dev" but again a segmentation fault came up: Get:4 http://www.nic.funet.fi/pub/mirrors/archive.ubuntu.com xenial/universe amd64 mpich amd64 3.2-6build1 [197 kB] Fetched 2 427 kB in 0s (4 238 kB/s) Selecting previously unselected package hwloc-nox. (Reading database ... 212998 files and directories currently installed.) Preparing to unpack .../hwloc-nox_1.11.2-3_amd64.deb ... Unpacking hwloc-nox (1.11.2-3) ... Selecting previously unselected package libmpich12:amd64. Preparing to unpack .../libmpich12_3.2-6build1_amd64.deb ... Unpacking libmpich12:amd64 (3.2-6build1) ... Selecting previously unselected package libmpich-dev. Preparing to unpack .../libmpich-dev_3.2-6build1_amd64.deb ... Unpacking libmpich-dev (3.2-6build1) ... Selecting previously unselected package mpich. Preparing to unpack .../mpich_3.2-6build1_amd64.deb ... Unpacking mpich (3.2-6build1) ... Processing triggers for man-db (2.7.5-1) ... Processing triggers for libc-bin (2.23-0ubuntu7) ... Setting up openmpi (2.1.0-2) ... chown: invalid user: ‘jsquyres:named’ chown: invalid user: ‘jsquyres:named’ dpkg: error processing package openmpi (--configure): subprocess installed post-installation script returned error exit status 1 Setting up hwloc-nox (1.11.2-3) ... Setting up libmpich12:amd64 (3.2-6build1) ... Setting up libmpich-dev (3.2-6build1) ... update-alternatives: using /usr/include/mpich to provide /usr/include/mpi (mpi) in auto mode Setting up mpich (3.2-6build1) ... Processing triggers for libc-bin (2.23-0ubuntu7) ... Errors were encountered while processing: openmpi I suspect that the problem does not originate specifically from Open MPI but so far other packages that I installed have been functioning properly. Could you suggest some solution? Thank you in advance. Best regards, Maria ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Erors and segmentation faults when installing openmpi-2.1
Note that Open MPI and MPICH are different implementations of the MPI specification. If you are mixing an Open MPI tarball install with an MPICH apt install, things will likely go downhill from there. You need to ensure to use Open MPI *or* MPICH. > On Mar 23, 2017, at 5:38 AM, Dimitrova, Maria > wrote: > > > Hello, > > I am setting up a freshly installed Ubuntu 16.04 computer to do some parallel > programming and I need the MPI compilers for C and Fortran. Using the > provided tar archive in the download page produces a series of errors (a very > long list because I tried running make all many times but I can attach it if > necessary). Instead, I tried just "sudo apt install libmpich-dev" but again a > segmentation fault came up: > > Get:4 http://www.nic.funet.fi/pub/mirrors/archive.ubuntu.com xenial/universe > amd64 mpich amd64 3.2-6build1 [197 kB] > Fetched 2 427 kB in 0s (4 238 kB/s) > Selecting previously unselected package hwloc-nox. > (Reading database ... 212998 files and directories currently installed.) > Preparing to unpack .../hwloc-nox_1.11.2-3_amd64.deb ... > Unpacking hwloc-nox (1.11.2-3) ... > Selecting previously unselected package libmpich12:amd64. > Preparing to unpack .../libmpich12_3.2-6build1_amd64.deb ... > Unpacking libmpich12:amd64 (3.2-6build1) ... > Selecting previously unselected package libmpich-dev. > Preparing to unpack .../libmpich-dev_3.2-6build1_amd64.deb ... > Unpacking libmpich-dev (3.2-6build1) ... > Selecting previously unselected package mpich. > Preparing to unpack .../mpich_3.2-6build1_amd64.deb ... > Unpacking mpich (3.2-6build1) ... > Processing triggers for man-db (2.7.5-1) ... > Processing triggers for libc-bin (2.23-0ubuntu7) ... > Setting up openmpi (2.1.0-2) ... > chown: invalid user: ‘jsquyres:named’ > chown: invalid user: ‘jsquyres:named’ > dpkg: error processing package openmpi (--configure): > subprocess installed post-installation script returned error exit status 1 > Setting up hwloc-nox (1.11.2-3) ... > Setting up libmpich12:amd64 (3.2-6build1) ... > Setting up libmpich-dev (3.2-6build1) ... > update-alternatives: using /usr/include/mpich to provide /usr/include/mpi > (mpi) in auto mode > Setting up mpich (3.2-6build1) ... > Processing triggers for libc-bin (2.23-0ubuntu7) ... > Errors were encountered while processing: > openmpi > > > I suspect that the problem does not originate specifically from Open MPI but > so far other packages that I installed have been functioning properly. Could > you suggest some solution? Thank you in advance. > > > Best regards, > Maria > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] a question about MPI dynamic process manage
It's likely a lot more efficient to MPI_COMM_SPAWN *all* of your children at once, and then subdivide up the resulting newcomm communicator as desired. It is *possible* to have a series MPI_COMM_SPAWN calls that spawn a single child process, and then later join all of those children into a single communicator, but it is somewhat tricky and likely not worth it (i.e., you'll save a lot of code complexity if you can spawn all the children at once). > On Mar 23, 2017, at 12:23 AM, gzzh...@buaa.edu.cn wrote: > > Hi team: > I have a question about MPI dynamic process manage, I hope you can > provide some help. > First of all, the MPI program running on multiple nodes, the group with > MPI_COMM_WORLD > was splitted into some subgroups by nodes and sub-communicators were created > respectively > so that MPI processes in one node can communicate with each other through > these sub-communicators. > Then using MPI_Comm_spawn("./child", NULL, 1, hostinfo, 0, sub-communicator, > &newcomm, &errs) > to spawn one child process in each node . Children processes were expected to > form a group > and further to create an intra-communicator so that using it some message > passing can be done between these children processes. > The question is how can I achieve that? Or only have to use MPI_Comm_accept > &MPI_Comm_connect to establish > a connection? > > best regards! > > - > Eric > > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes
Hi Gilles, I'm currently testing and here are some preliminary results: On Thu, Mar 23, 2017 at 10:33 AM, Gilles Gouaillardet wrote: > Can you please try > mpirun --mca btl tcp,self ... this failed to produce the program output, there were lots of errors like this: [pax11-00][[54124,1],31][btl_tcp_endpoint.c:803:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.225.202 failed: Connection timed out (110) I had to terminate the job. That's why I have added the option --mca btl_tcp_if_exclude ib0 . In this case, the program started to produce output, but started hanging early on with this error: [pax11-00][[61232,1],31][btl_tcp_endpoint.c:803:mca_btl_tcp_endpoint_complete_connect] connect() to 127.0.0.1 failed: Connection refused (111) [pax11-01][[61232,1],63][btl_tcp_endpoint.c:649:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[61232,1],33] I have aborted that job as well. > And if it works > mpirun --mca btl openib,self ... This is running fine so far but will take some more time. Regards, Götz ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Help with Open MPI 2.1.0 and PGI 16.10: Configure and C++
Hi, Am 22.03.2017 um 20:12 schrieb Matt Thompson: > […] > > Ah. PGI 16.9+ now use pgc++ to do C++ compiling, not pgcpp. So, I hacked > configure so that references to pgCC (nonexistent on macOS) are gone and all > pgcpp became pgc++, but: This is not unique to macOS. pgCC used STLPort STL and is no longer included with their compiler suite, pgc++ now uses a GCC compatible library format and replaces the former one on Linux too. There I get, ignoring the gnu output during `configure` and compiling anyway: $ mpic++ --version pgc++ 16.10-0 64-bit target on x86-64 Linux -tp bulldozer The Portland Group - PGI Compilers and Tools Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved. Maybe some options for the `mpic++` wrapper were just set in a wrong way? === Nevertheless: did you see the error on the Mac at the end of the `configure` step too, or was it gone after the hints on the discussion's link you posted? As I face it there still about "libeevent". -- Reuti > > *** C++ compiler and preprocessor > checking whether we are using the GNU C++ compiler... yes > checking whether pgc++ accepts -g... yes > checking dependency style of pgc++... none > checking how to run the C++ preprocessor... pgc++ -E > checking for the C++ compiler vendor... gnu > > Well, at this point, I think I'm stopping until I get help. Will this chunk > of configure always return gnu for PGI? I know the C part returns 'portland > group': > > *** C compiler and preprocessor > checking for gcc... (cached) pgcc > checking whether we are using the GNU C compiler... (cached) no > checking whether pgcc accepts -g... (cached) yes > checking for pgcc option to accept ISO C89... (cached) none needed > checking whether pgcc understands -c and -o together... (cached) yes > checking for pgcc option to accept ISO C99... none needed > checking for the C compiler vendor... portland group > > so I thought the C++ section would as well. I also tried passing in > --enable-mpi-cxx, but that did nothing. > > Is this just a red herring? My real concern is with pgfortran/mpifort, but I > thought I'd start with this. If this is okay, I'll move on and detail the > fortran issues I'm having. > > Matt > -- > Matt Thompson > Man Among Men > Fulcrum of History > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users signature.asc Description: Message signed with OpenPGP using GPGMail ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Help with Open MPI 2.1.0 and PGI 16.10: Configure and C++
Matt, a C++ compiler is required to configure Open MPI. That being said, C++ compiler is only used if you build the C++ bindings (That were removed from MPI-3) And unless you plan to use the mpic++ wrapper (with or without the C++ bindings), a valid C++ compiler is not required at all. /* configure still requires one, and that could be improved */ My point is you should not worry too much about configure messages related to C++, and you should instead focus on the Fortran issue. Cheers, Gilles On Thursday, March 23, 2017, Matt Thompson wrote: > All, I'm hoping one of you knows what I might be doing wrong here. I'm > trying to use Open MPI 2.1.0 for PGI 16.10 (Community Edition) on macOS. > Now, I built it a la: > > http://www.pgroup.com/userforum/viewtopic.php?p=21105#21105 > > and found that it built, but the resulting mpifort, etc were just not > good. Couldn't even do Hello World. > > So, I thought I'd start from the beginning. I tried running: > > configure --disable-wrapper-rpath CC=pgcc CXX=pgc++ FC=pgfortran > --prefix=/Users/mathomp4/installed/Compiler/pgi-16.10/openmpi/2.1.0 > but when I did I saw this: > > *** C++ compiler and preprocessor > checking whether we are using the GNU C++ compiler... yes > checking whether pgc++ accepts -g... yes > checking dependency style of pgc++... none > checking how to run the C++ preprocessor... pgc++ -E > checking for the C++ compiler vendor... gnu > > Well, that's not the right vendor. So, I took a look at configure and I > saw that at least some detection for PGI was a la: > > pgCC* | pgcpp*) > # Portland Group C++ compiler > case `$CC -V` in > *pgCC\ [1-5].* | *pgcpp\ [1-5].*) > > pgCC* | pgcpp*) > # Portland Group C++ compiler > lt_prog_compiler_wl_CXX='-Wl,' > lt_prog_compiler_pic_CXX='-fpic' > lt_prog_compiler_static_CXX='-Bstatic' > ;; > > Ah. PGI 16.9+ now use pgc++ to do C++ compiling, not pgcpp. So, I hacked > configure so that references to pgCC (nonexistent on macOS) are gone and > all pgcpp became pgc++, but: > > *** C++ compiler and preprocessor > checking whether we are using the GNU C++ compiler... yes > checking whether pgc++ accepts -g... yes > checking dependency style of pgc++... none > checking how to run the C++ preprocessor... pgc++ -E > checking for the C++ compiler vendor... gnu > > Well, at this point, I think I'm stopping until I get help. Will this > chunk of configure always return gnu for PGI? I know the C part returns > 'portland group': > > *** C compiler and preprocessor > checking for gcc... (cached) pgcc > checking whether we are using the GNU C compiler... (cached) no > checking whether pgcc accepts -g... (cached) yes > checking for pgcc option to accept ISO C89... (cached) none needed > checking whether pgcc understands -c and -o together... (cached) yes > checking for pgcc option to accept ISO C99... none needed > checking for the C compiler vendor... portland group > > so I thought the C++ section would as well. I also tried passing in > --enable-mpi-cxx, but that did nothing. > > Is this just a red herring? My real concern is with pgfortran/mpifort, but > I thought I'd start with this. If this is okay, I'll move on and detail the > fortran issues I'm having. > > Matt > -- > Matt Thompson > > Man Among Men > Fulcrum of History > > ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes
Hi Gilles, On Thu, Mar 23, 2017 at 10:33 AM, Gilles Gouaillardet wrote: > mpirun --mca btl openib,self ... Looks like this didn't finish, I had to terminate the job during the Gather with 32 processes step. > Then can you try > mpirun --mca coll ^tuned --mca btl tcp,self ... As mentioned, this didn't produce any program output, just the mentioned errors. I have also tried mpirun --mca coll ^tuned --mca btl tcp,openib , this finished fine, but was quite slow. I am currently testing with mpirun --mca coll ^tuned Regards, Götz ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes
On Thu, Mar 23, 2017 at 2:37 PM, Götz Waschk wrote: > I have also tried mpirun --mca coll ^tuned --mca btl tcp,openib , this > finished fine, but was quite slow. I am currently testing with mpirun > --mca coll ^tuned This one ran also fine. ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
[OMPI users] more migrating to MPI_F08
Hello, Attached is a simple MPI program demonstrating a problem I have encountered with 'MPI_Type_create_hindexed' when compiling with the 'mpi_f08' module. There are 2 blocks of code that are only different in how the length and displacement arrays are declared. I get indx.f90(50): error #6285: There is no matching specific subroutine for this generic subroutine call. [MPI_TYPE_CREATE_HINDEXED] call mpi_type_create_hindexed(lenidx,ijlena_2d,ijdispl_2d, & for a case where the length and displacement arrays are 2-dimensional. As far as I can find, there is nothing in the MPI-3.1 standard that requires that these arrays be 1-dimensional. Am I missing something, or is this a OPEN-MPI bug? I have been running successful programs with multi-dimensional versions of these arrays for years when compiling with 'mpif.h'. T. Rosmond program hindx use mpi_f08 implicit none integer i,lenidx,ierr,irank,nproc,lint,ibyte integer, dimension(:), allocatable :: ijlena_1d integer, dimension(:,:), allocatable :: ijlena_2d integer(kind=MPI_ADDRESS_KIND), dimension(:), allocatable :: ijdispl_1d integer(kind=MPI_ADDRESS_KIND), dimension(:,:), allocatable :: ijdispl_2d type(mpi_status) :: status type(mpi_datatype) :: ij_vector_type_1d type(mpi_datatype) :: ij_vector_type_2d call MPI_Init(ierr) call MPI_Comm_rank(MPI_COMM_WORLD,irank,ierr) call MPI_Comm_size(MPI_COMM_WORLD,nproc,ierr) call mpi_sizeof(i,ibyte,ierr) ! 1-D case (ocmpile success) lenidx= 10 allocate(ijlena_1d(lenidx)) allocate(ijdispl_1d(lenidx)) do i=1,lenidx ijlena_1d(i)= 1 ijdispl_1d(i)= (i-1)*ibyte enddo call mpi_type_create_hindexed(lenidx,ijlena_1d,ijdispl_1d, & mpi_real,ij_vector_type_1d,ierr) call mpi_type_commit(ij_vector_type_1d,ierr) ! 2-D case (ocmpile failure) lenidx= 10 allocate(ijlena_2d(lenidx,1)) allocate(ijdispl_2d(lenidx,1)) do i=1,lenidx ijlena_2d(i,1)= 1 ijdispl_2d(i,1)= (i-1)*ibyte enddo call mpi_type_create_hindexed(lenidx,ijlena_2d,ijdispl_2d, & mpi_real,ij_vector_type_2d,ierr) call mpi_type_commit(ij_vector_type_2d,ierr) call mpi_finalize(ierr) end ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
[OMPI users] openmpi installation error
I need mpirun to run a genome assembler. Linux installation of openmpi-2.1.0 stops during make all saying: "Perl 5.006 required--this is only version 5.00503, stopped at /usr/share/perl5/vars.pm line 3." Is it really that Perl specific? I am following the standard installation path without root access. Thanks. Vinay K. Mittal ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] openmpi installation error
That's a pretty weird error. We don't require any specific version of perl that I'm aware of. Are you sure that it's Open MPI's installer that is kicking out the error? Can you send all the information listed here: https://www.open-mpi.org/community/help/ > On Mar 23, 2017, at 1:39 PM, Vinay Mittal wrote: > > I need mpirun to run a genome assembler. > > Linux installation of openmpi-2.1.0 stops during make all saying: > > "Perl 5.006 required--this is only version 5.00503, stopped at > /usr/share/perl5/vars.pm line 3." > > Is it really that Perl specific? I am following the standard installation > path without root access. > > Thanks. > > > Vinay K. Mittal > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] more migrating to MPI_F08
Actually, MPI-3.1 p90:37-45 explicitly says that the array_of_blocklengths and array_of_displacements arrays must be both 1D and of length count. If my Fortran memory serves me correctly, I think you can pass in an array subsection if your blocklengths/displacements are part of a larger array (the compiler should be smart enough to do that without forcing a copy, when possible). FWIW: one of the design goals of the mpi_f08 module was to bring to light previous practices that people used in their applications with mpif.h that, although they worked, probably were not really correct Fortran / worked kinda by accident. Recall that F08 is very, very strongly typed (even more so than C++). Meaning: being picky about 1D-and-a-specific-legnth is a *feature*! (yeah, it's kind of a PITA, but it really does help prevent bugs) > On Mar 23, 2017, at 1:06 PM, Tom Rosmond wrote: > > Hello, > > Attached is a simple MPI program demonstrating a problem I have encountered > with 'MPI_Type_create_hindexed' when compiling with the 'mpi_f08' module. > There are 2 blocks of code that are only different in how the length and > displacement arrays are declared. I get > > indx.f90(50): error #6285: There is no matching specific subroutine for this > generic subroutine call. [MPI_TYPE_CREATE_HINDEXED] > call mpi_type_create_hindexed(lenidx,ijlena_2d,ijdispl_2d, & > > for a case where the length and displacement arrays are 2-dimensional. As > far as I can find, there is nothing in the MPI-3.1 standard that requires > that these arrays be 1-dimensional. Am I missing something, or is this a > OPEN-MPI bug? I have been running successful programs with multi-dimensional > versions of these arrays for years when compiling with 'mpif.h'. > > T. Rosmond > > > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] openmpi installation error
On 23 March 2017 at 17:39, Vinay Mittal wrote: > I need mpirun to run a genome assembler. > > Linux installation of openmpi-2.1.0 stops during make all saying: > > "Perl 5.006 required--this is only version 5.00503, stopped at > /usr/share/perl5/vars.pm line 3." This looks like Perl's own verification process (inside vars.pm), and a broken one at that. Perl 5.005 is really old (98) and it's the minimal Perl you should possibly be using. A quick look at Perl History [1] shows that there is no such thing as Perl 5.006! I suggest you clean up and upgrade your Perl installation before trying to install OpenMPI again. :) cheers, --renato [1] http://perldoc.perl.org/perlhist.html ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] more migrating to MPI_F08
Thanks, Jeff, I had stared at those lines many times and it didn't register that (count) was explicitly specifying only 1-D is allowed. Pretty cryptic. I wonder how many other fortran programmers will be bit by this? T. Rosmond On 03/23/2017 10:50 AM, Jeff Squyres (jsquyres) wrote: Actually, MPI-3.1 p90:37-45 explicitly says that the array_of_blocklengths and array_of_displacements arrays must be both 1D and of length count. If my Fortran memory serves me correctly, I think you can pass in an array subsection if your blocklengths/displacements are part of a larger array (the compiler should be smart enough to do that without forcing a copy, when possible). FWIW: one of the design goals of the mpi_f08 module was to bring to light previous practices that people used in their applications with mpif.h that, although they worked, probably were not really correct Fortran / worked kinda by accident. Recall that F08 is very, very strongly typed (even more so than C++). Meaning: being picky about 1D-and-a-specific-legnth is a *feature*! (yeah, it's kind of a PITA, but it really does help prevent bugs) On Mar 23, 2017, at 1:06 PM, Tom Rosmond wrote: Hello, Attached is a simple MPI program demonstrating a problem I have encountered with 'MPI_Type_create_hindexed' when compiling with the 'mpi_f08' module. There are 2 blocks of code that are only different in how the length and displacement arrays are declared. I get indx.f90(50): error #6285: There is no matching specific subroutine for this generic subroutine call. [MPI_TYPE_CREATE_HINDEXED] call mpi_type_create_hindexed(lenidx,ijlena_2d,ijdispl_2d, & for a case where the length and displacement arrays are 2-dimensional. As far as I can find, there is nothing in the MPI-3.1 standard that requires that these arrays be 1-dimensional. Am I missing something, or is this a OPEN-MPI bug? I have been running successful programs with multi-dimensional versions of these arrays for years when compiling with 'mpif.h'. T. Rosmond ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] more migrating to MPI_F08
On Mar 23, 2017, at 3:20 PM, Tom Rosmond wrote: > > I had stared at those lines many times and it didn't register that (count) > was explicitly specifying only 1-D is allowed. Pretty cryptic. I wonder how > many other fortran programmers will be bit by this? My understanding is that that is standard Fortran notation (e.g., it's how we declare them in the mpi_f08 module in Open MPI). Yes, it might be a bit cryptic for the uninitiated, but -- for better or for worse -- that's how the language is defined. -- Jeff Squyres jsquy...@cisco.com ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users