Re: [OMPI users] Segmentation fault

2023-08-09 Thread Aziz Ogutlu via users
3 10:08 AM *To:* Jeff Squyres (jsquyres) ; Open MPI Users *Subject:* Re: [OMPI users] Segmentation fault Hi Jeff, I also tried with OpenMPI 4.1.5, I got same error. On 8/9/23 17:05, Jeff Squyres (jsquyres) wrote: I'm afraid I don't know anything about the SU2 application. You are u

Re: [OMPI users] Segmentation fault

2023-08-09 Thread Jeff Squyres (jsquyres) via users
simple MPI application that replicates the issue? That would be something we could dig into and investigate. From: Aziz Ogutlu Sent: Wednesday, August 9, 2023 10:31 AM To: Jeff Squyres (jsquyres) ; Open MPI Users Subject: Re: [OMPI users] Segmentation fault Hi J

Re: [OMPI users] Segmentation fault

2023-08-09 Thread Jeff Squyres (jsquyres) via users
ation. From: Aziz Ogutlu Sent: Wednesday, August 9, 2023 10:08 AM To: Jeff Squyres (jsquyres) ; Open MPI Users Subject: Re: [OMPI users] Segmentation fault Hi Jeff, I also tried with OpenMPI 4.1.5, I got same error. On 8/9/23 17:05, Jeff Squyres (jsquyres) wrote: I'

Re: [OMPI users] Segmentation fault

2023-08-09 Thread Aziz Ogutlu via users
ou upgrade to the latest version of Open MPI (v4.1.5)? *From:* users on behalf of Aziz Ogutlu via users *Sent:* Wednesday, August 9, 2023 3:26 AM *To:* Open MPI Users *Cc:* Aziz Ogutlu *Subject:* [OMPI users] Segmenta

Re: [OMPI users] Segmentation fault

2023-08-09 Thread Jeff Squyres (jsquyres) via users
lf of Aziz Ogutlu via users Sent: Wednesday, August 9, 2023 3:26 AM To: Open MPI Users Cc: Aziz Ogutlu Subject: [OMPI users] Segmentation fault Hi there all, We're using SU2 with OpenMPI 4.0.3, gcc 8.5.0 on Redhat 7.9. We compiled all component for using on HPC system. When I use SU2

[OMPI users] Segmentation fault

2023-08-09 Thread Aziz Ogutlu via users
Hi there all, We're using SU2 with OpenMPI 4.0.3, gcc 8.5.0 on Redhat 7.9. We compiled all component for using on HPC system. When I use SU2 with QuickStart config file with OpenMPI, it gives error like in attached file. Command is: |mpirun -np 8 --allow-run-as-root SU2_CFD inv_NACA0012.cfg|

Re: [OMPI users] Segmentation fault when using 31 or 32 ranks

2019-07-10 Thread Raymond Arter via users
Jeff and Steven, Thanks for your help. I downloaded the nightly snapshot and it fixes the problem. I need to do more testing tomorrow and I will report back if any issues arise. Thanks again. T. On 10/07/2019 18:44, Jeff Squyres (jsquyres) via users wrote: It might be worth trying the lates

Re: [OMPI users] Segmentation fault when using 31 or 32 ranks

2019-07-10 Thread Jeff Squyres (jsquyres) via users
It might be worth trying the latest v4.0.x nightly snapshot -- we just updated the internal PMIx on the v4.0.x branch: https://www.open-mpi.org/nightly/v4.0.x/ > On Jul 10, 2019, at 1:29 PM, Steven Varga via users > wrote: > > Hi i am fighting similar. Did you try to update the pmix most

Re: [OMPI users] Segmentation fault when using 31 or 32 ranks

2019-07-10 Thread Steven Varga via users
Hi i am fighting similar. Did you try to update the pmix most recent 3.1.3 series release? On Wed, Jul 10, 2019, 12:24 Raymond Arter via users, < users@lists.open-mpi.org> wrote: > Hi, > > I have the following issue with version 4.0.1 when running on a node with > two 16 core CPUs (Intel Xeon Gol

[OMPI users] Segmentation fault when using 31 or 32 ranks

2019-07-10 Thread Raymond Arter via users
Hi, I have the following issue with version 4.0.1 when running on a node with two 16 core CPUs (Intel Xeon Gold 6142) installed. Running with 30 ranks or less is fine, and running 33 or above gives the "not enough slots" message which is expected. However, using 31 or 32 ranks results in the foll

Re: [OMPI users] Segmentation fault using openmpi-master-201901030305-ee26ed9

2019-01-04 Thread Howard Pritchard
Hi Sigmar, I observed this problem yesterday myself and should have a fix in to master later today. Howard Am Fr., 4. Jan. 2019 um 05:30 Uhr schrieb Siegmar Gross < siegmar.gr...@informatik.hs-fulda.de>: > Hi, > > I've installed (tried to install) openmpi-master-201901030305-ee26ed9 on > my "

[OMPI users] Segmentation fault using openmpi-master-201901030305-ee26ed9

2019-01-04 Thread Siegmar Gross
Hi, I've installed (tried to install) openmpi-master-201901030305-ee26ed9 on my "SUSE Linux Enterprise Server 12.3 (x86_64)" with gcc-7.3.0, icc-19.0.1.144 pgcc-18.4-0, and Sun C 5.15 (Oracle Developer Studio 12.6). Unfortunately, I still cannot build it with Sun C and I get a segmentation fault

[OMPI users] segmentation fault to use openMPI

2017-10-11 Thread RUI ZHANG
Hello everyone, I am trying to debug through the MPI functionality at our local clusters. I use openmpi 3.0 and the executable were compiled by PGI 10.9. The executable is a regional air quality model called "CAMx" which is widely used in our community. In our local clusters setting, I have a clus

[OMPI users] [OMPI USERS] segmentation fault at startup

2017-09-08 Thread Alberto Ortiz
Hi, I have a system running openmpi programs over archlinux. I had the programs compiled and running on July when I was using the version 1.10.4 or .7 of openmpi if I remember correctly. Just recently I updated the openmpi version to 2.1.1 and tried running a compiled program and ran correctly. The

Re: [OMPI users] Segmentation Fault when using OpenMPI 1.10.6 and PGI 17.1.0 on POWER8

2017-02-21 Thread r...@open-mpi.org
Can you provide a backtrace with line numbers from a debug build? We don’t get much testing with lsf, so it is quite possible there is a bug in there. > On Feb 21, 2017, at 7:39 PM, Hammond, Simon David (-EXP) > wrote: > > Hi OpenMPI Users, > > Has anyone successfully tested OpenMPI 1.10.6 wi

[OMPI users] Segmentation Fault when using OpenMPI 1.10.6 and PGI 17.1.0 on POWER8

2017-02-21 Thread Hammond, Simon David (-EXP)
Hi OpenMPI Users, Has anyone successfully tested OpenMPI 1.10.6 with PGI 17.1.0 on POWER8 with the LSF scheduler (—with-lsf=..)? I am getting this error when the code hits MPI_Finalize. It causes the job to abort (i.e. exit the LSF session) when I am running interactively. Are there any materi

Re: [OMPI users] segmentation fault with openmpi-2.0.2rc2 on Linux

2017-01-03 Thread Siegmar Gross
Hi Howard, it still works with 4 processes and "vader" will not send the following output about missing communication peers if I start at least 2 processes. ... [loki:14965] select: initializing btl component vader [loki][[42444,1],0][../../../../../openmpi-2.0.2rc2/opal/mca/btl/vader/btl_vader_

Re: [OMPI users] segmentation fault with openmpi-2.0.2rc2 on Linux

2017-01-03 Thread Howard Pritchard
HI Siegmar, Could you please rerun the spawn_slave program with 4 processes? Your original traceback indicates a failure in the barrier in the slave program. I'm interested in seeing if when you run the slave program standalone with 4 processes the barrier failure is observed. Thanks, Howard

Re: [OMPI users] segmentation fault with openmpi-2.0.2rc2 on Linux

2017-01-02 Thread Siegmar Gross
Hi Howard, thank you very much that you try to solve my problem. I haven't changed the programs since 2013 so that you use the correct version. The program works as expected with the master trunk as you can see at the bottom of this email from my last mail. The slave program works when I launch i

Re: [OMPI users] segmentation fault with openmpi-2.0.2rc2 on Linux

2017-01-02 Thread Howard Pritchard
HI Siegmar, I've attempted to reproduce this using gnu compilers and the version of this test program(s) you posted earlier in 2016 but am unable to reproduce the problem. Could you double check that the slave program can be successfully run when launched directly by mpirun/mpiexec? It might also

[OMPI users] segmentation fault with openmpi-2.0.2rc2 on Linux

2016-12-28 Thread Siegmar Gross
Hi, I have installed openmpi-2.0.2rc2 on my "SUSE Linux Enterprise Server 12 (x86_64)" with Sun C 5.14 beta and gcc-6.2.0. Unfortunately, I get an error when I run one of my programs. Everything works as expected with openmpi-master-201612232109-67a08e8. The program gets a timeout with openmpi-v2

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-12-23 Thread Howard Pritchard
Hi Paul, Thanks very much Christmas present. The Open MPI README has been updated to include a note about issues with the Intel 16.0.3-4 compiler suites. Enjoy the holidays, Howard 2016-12-23 3:41 GMT-07:00 Paul Kapinos : > Hi all, > > we discussed this issue with Intel compiler support and

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-12-23 Thread Paul Kapinos
Hi all, we discussed this issue with Intel compiler support and it looks like they now know what the issue is and how to protect after. It is a known issue resulting from a backwards incompatibility in an OS/glibc update, cf. https://sourceware.org/bugzilla/show_bug.cgi?id=20019 Affected ver

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-12-14 Thread Paul Kapinos
Hello all, we seem to run into the same issue: 'mpif90' sigsegvs immediately for Open MPI 1.10.4 compiled using Intel compilers 16.0.4.258 and 16.0.3.210, while it works fine when compiled with 16.0.2.181. It seems to be a compiler issue (more exactly: library issue on libs delivered with 16.

Re: [OMPI users] Segmentation fault (invalid mem ref) at MPI_StartAll (second call)

2016-11-25 Thread George Bosilca
At the first glance I would say you are confusing the variables counting your requests, reqcount and nrequests. George. On Fri, Nov 25, 2016 at 7:11 AM, Paolo Pezzutto wrote: > Dear all, > > I am struggling with an invalid memory reference when calling SUB EXC_MPI > (MOD01), and precisely at

[OMPI users] Segmentation fault (invalid mem ref) at MPI_StartAll (second call)

2016-11-25 Thread Paolo Pezzutto
Dear all, I am struggling with an invalid memory reference when calling SUB EXC_MPI (MOD01), and precisely at MPI_StartAll (see comment) below. @@ ! ** file mod01.f90 ! MODULE MOD01 implicit none include 'mpif.h' ! alternat

[OMPI users] Segmentation fault with openmpi-v2.0.1-134-g52bea1d on SuSE Linux

2016-11-02 Thread Siegmar Gross
Hi, I have installed openmpi-v2.0.1-134-g52bea1d on my "SUSE Linux Enterprise Server 12 (x86_64)" with Sun C 5.14 beta and gcc-6.2.0. Unfortunately, I get an error when I run one of my programs. loki spawn 149 ompi_info | grep -e "Open MPI:" -e "C compiler absolute:" Open MPI: 2.

[OMPI users] Segmentation fault for openmpi-2.0.1

2016-09-05 Thread Siegmar Gross
Hi, I have installed openmpi-2.0.1 on my "SUSE Linux Enterprise Server 12 (x86_64)" with Sun C 5.14 beta and gcc-6.1.0. Unfortunately, I still get the segmentation fault, which I reported for openmpi-v2.0.0-233-gb5f0a4f. I would be grateful, if somebody can fix the problem. Thank you very much

[OMPI users] Segmentation fault for openmpi-v2.0.0-233-gb5f0a4f with SuSE Linux

2016-08-28 Thread Siegmar Gross
Hi, I have installed openmpi-v2.0.0-233-gb5f0a4f on my "SUSE Linux Enterprise Server 12 (x86_64)" with Sun C 5.14 beta and gcc-6.1.0. Unfortunately I have a problem with my program "spawn_master". It hangs if I run it on my local machine and I get I segmentation fault if I run it on a remote mach

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-27 Thread Siegmar Gross
Hi Ralph, Am 26.05.2016 um 17:38 schrieb Ralph Castain: I’m afraid I honestly can’t make any sense of it. It seems you at least have a simple workaround (use a hostfile instead of -host), yes? Only the combination "--host" and "--slot-list" breaks. Everything else works as expected. One more

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-26 Thread Ralph Castain
I’m afraid I honestly can’t make any sense of it. It seems you at least have a simple workaround (use a hostfile instead of -host), yes? > On May 26, 2016, at 5:48 AM, Siegmar Gross > wrote: > > Hi Ralph and Gilles, > > it's strange that the program works with "--host" and "--slot-list" > in

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-26 Thread Siegmar Gross
Hi Ralph and Gilles, it's strange that the program works with "--host" and "--slot-list" in your environment and not in mine. I get the following output, if I run the program in gdb without a breakpoint. loki spawn 142 gdb /usr/local/openmpi-1.10.3_64_gcc/bin/mpiexec GNU gdb (GDB; SUSE Linux En

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-25 Thread Siegmar Gross
Hi, I've updated to rc3 and have still the same error. Is the following output helpful to see, what's going on on my machine? loki spawn 145 gdb /usr/local/openmpi-1.10.3_64_gcc/bin/mpiexec GNU gdb (GDB; SUSE Linux Enterprise 12) 7.9.1 ... Reading symbols from /usr/local/openmpi-1.10.3_64_gcc/bi

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Ralph Castain
Works perfectly for me, so I believe this must be an environment issue - I am using gcc 6.0.0 on CentOS7 with x86: $ mpirun -n 1 -host bend001 --slot-list 0:0-1,1:0-1 --report-bindings ./simple_spawn [bend001:17599] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socke

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Siegmar Gross
Hi Ralph and Gilles, the program breaks only, if I combine "--host" and "--slot-list". Perhaps this information is helpful. I use a different machine now, so that you can see that the problem is not restricted to "loki". pc03 spawn 115 ompi_info | grep -e "OPAL repo revision:" -e "C compiler a

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Ralph Castain
> On May 24, 2016, at 6:21 AM, Siegmar Gross > wrote: > > Hi Ralph, > > I copy the relevant lines to this place, so that it is easier to see what > happens. "a.out" is your program, which I compiled with mpicc. > > >> loki spawn 153 ompi_info | grep -e "OPAL repo revision:" -e "C compiler > >

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Siegmar Gross
Hi Ralph, I copy the relevant lines to this place, so that it is easier to see what happens. "a.out" is your program, which I compiled with mpicc. >> loki spawn 153 ompi_info | grep -e "OPAL repo revision:" -e "C compiler >> absolute:" >> OPAL repo revision: v1.10.2-201-gd23dda8 >> C co

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Jeff Squyres (jsquyres)
On May 24, 2016, at 7:19 AM, Siegmar Gross wrote: > > I don't see a difference for my spawned processes, because both functions will > "wait" until all pending operations have finished, before the object will be > destroyed. Nevertheless, perhaps my small example program worked all the years > b

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Ralph Castain
> On May 24, 2016, at 4:19 AM, Siegmar Gross > wrote: > > Hi Ralph, > > thank you very much for your answer and your example program. > > On 05/23/16 17:45, Ralph Castain wrote: >> I cannot replicate the problem - both scenarios work fine for me. I’m not >> convinced your test code is correct

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Siegmar Gross
Hi Ralph, thank you very much for your answer and your example program. On 05/23/16 17:45, Ralph Castain wrote: I cannot replicate the problem - both scenarios work fine for me. I’m not convinced your test code is correct, however, as you call Comm_free the inter-communicator but didn’t call Co

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-23 Thread Ralph Castain
I cannot replicate the problem - both scenarios work fine for me. I’m not convinced your test code is correct, however, as you call Comm_free the inter-communicator but didn’t call Comm_disconnect. Checkout the attached for a correct code and see if it works for you. FWIW: I don’t know how many

[OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-23 Thread Siegmar Gross
Hi, I installed openmpi-1.10.3rc2 on my "SUSE Linux Enterprise Server 12 (x86_64)" with Sun C 5.13 and gcc-6.1.0. Unfortunately I get a segmentation fault for "--slot-list" for one of my small programs. loki spawn 119 ompi_info | grep -e "OPAL repo revision:" -e "C compiler absolute:" O

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-09 Thread Giacomo Rossi
I've send you all the outputs from configure, make and make install commands... Today I've compiled openmpi with the latest gcc version (6.1.1) shipped with my archlinux distro and everything seems ok, so I think that the problem is with intel compiler. Giacomo Rossi Ph.D., Space Engineer Resear

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-06 Thread Dave Love
Gus Correa writes: > Hi Giacomo > > Some programs fail with segmentation fault > because the stack size is too small. Yes, the default for Intel Fortran is to allocate large-ish amounts on the stack, which may matter when the compiled program runs. However, look at the backtrace. It's apparent

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-06 Thread Jeff Squyres (jsquyres)
Ok, good. I asked that question because typically when we see errors like this, it is usually either a busted compiler installation or inadvertently mixing the run-times of multiple different compilers in some kind of incompatible way. Specifically, the mpifort (aka mpif90) application is a fa

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-06 Thread Giacomo Rossi
Yes, I've tried three simple "Hello world" programs in fortan, C and C++ and the compile and run with intel 16.0.3. The problem is with the openmpi compiled from source. Giacomo Rossi Ph.D., Space Engineer Research Fellow at Dept. of Mechanical and Aerospace Engineering, "Sapienza" University of

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Jeff Squyres (jsquyres)
Giacomo -- Are you able to run anything that is compiled by that Intel compiler installation? > On May 5, 2016, at 12:02 PM, Gus Correa wrote: > > Hi Giacomo > > Some programs fail with segmentation fault > because the stack size is too small. > [But others because of bugs in memory allocat

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Gus Correa
Hi Giacomo Some programs fail with segmentation fault because the stack size is too small. [But others because of bugs in memory allocation/management, etc.] Have you tried ulimit -s unlimited before you run the program? Are you using a single machine or a cluster? If you're using infiniband

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Giacomo Rossi
gdb /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90 GNU gdb (GDB) 7.11 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Gilles Gouaillardet
Giacomo, one option is to (if your shell is bash) ulimit -c unlimited mpif90 -v you should get a core file an other option is to gdb /.../mpif90 r -v bt Cheers, Gilles On Thursday, May 5, 2016, Giacomo Rossi wrote: > Here the result of ldd command: > 'ldd /opt/openmpi/1.10.2/intel/16.0.3/bin

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Giacomo Rossi
Here the result of ldd command: 'ldd /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90 linux-vdso.so.1 (0x7ffcacbbe000) libopen-pal.so.13 => /opt/openmpi/1.10.2/intel/16.0.3/lib/libopen-pal.so.13 (0x7fa9597a9000) libm.so.6 => /usr/lib/libm.so.6 (0x7fa9594a4000) libpciaccess.so.0 => /usr/lib/l

[OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Gilles Gouaillardet
Giacomo, could you also open the core file with gdb and post the backtrace ? can you also ldd mpif90 and confirm no intel MPI library is used ? btw, the OpenMPI fortran wrapper is now mpifort Cheers, Gilles On Thursday, May 5, 2016, Giacomo Rossi > wrote: > I’ve installed the latest version

[OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Giacomo Rossi
I’ve installed the latest version of Intel Parallel Studio (16.0.3), then I’ve downloaded the latest version of openmpi (1.10.2) and I’ve compiled it with `./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=/opt/openmpi/1.10.2/intel/16.0.3` then I've installed and everything seems ok, but wh

Re: [OMPI users] OMPI users] segmentation fault with java MPI

2016-01-27 Thread Gilles Gouaillardet
ayInputStream in = new ByteArrayInputStream(readbuf); >    ObjectInputStream is = new ObjectInputStream(in); >    System.out.println("Program fine until this line!"); >    test = is.readObject(); >    } > >    MPI.Finalize(); >    } >}

Re: [OMPI users] segmentation fault with java MPI

2016-01-25 Thread Gilles Gouaillardet
dbuf, filesize, MPI.BYTE); > Object test = null; > ByteArrayInputStream in = new ByteArrayInputStream(readbuf); > ObjectInputStream is = new ObjectInputStream(in); > System.out.println("Program fine until this line!"); > test = is.r

Re: [OMPI users] segmentation fault with java MPI

2016-01-25 Thread Marko Blatzheim
tInputStream(in);     System.out.println("Program fine until this line!");     test = is.readObject();     }     MPI.Finalize();     } }   Thanks Marko   Gesendet: Montag, 25. Januar 2016 um 01:04 Uhr Von: "Gilles Gouaillardet" An: "Open MPI Users&qu

Re: [OMPI users] segmentation fault with java MPI

2016-01-24 Thread Gilles Gouaillardet
Marko, i wrote a test program based on your code snippet and it works for me. could you please : - post a standalone test case that is ready to be compiled and ran - which version of OpenMPI are you using ? - which JVM are you using ? (vendor and version) - post your full command line Cheers,

[OMPI users] segmentation fault with java MPI

2016-01-24 Thread Marko Blatzheim
Hi, I want to load a saved object using java mpi. Without MPI there is no problem in reading the file and casting it to the correct type. I tried to open the file as a byte array and convert this to an object. I checked that all bytes are read correctly. Here I have an example where the saved

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-07 Thread George Bosilca
Bogdan, The bug was solely related to the number of entries in the datatype, and not the number of elements nor the size/extent of the datatype. As such, 64 bits support was not impacted by this bug. >From the user perspective, the only visible improvement is the possibility to create datatypes w

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-06 Thread Bogdan Sataric
George thank you very much! So can I assume that new indexed type in 1.8.5 will support 64-bit large datatypes, or over the current 4GB datatypes (and some strange internal restrictions in my case)? Or is there any clue what will be the improvement over the existing datatype restrictions? Regards

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-05 Thread George Bosilca
On Thu, Mar 5, 2015 at 6:22 PM, Bogdan Sataric wrote: > Hello George, > > So is it safe for me to assume that my code is good and that you will > remove this bug from next OpenMPI version? > Yes I think it is safe to assume your code is correct (or at least it follows the specifications you desc

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-05 Thread Bogdan Sataric
Hello Tom, Actually I have tried using: MPI_Type_Create_Hindexed but the same problem persisted for the same matrix dimensions. Displacements array values are not a problem. Matrix of a size 800x640x480 creates type that is a bit less then 4GB large in case of complex datatype. It definitely fits

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-05 Thread Bogdan Sataric
Hello George, So is it safe for me to assume that my code is good and that you will remove this bug from next OpenMPI version? Also I would like to know which future OpenMPI version will incorporate this fix (so I can try my code in fixed version)? Thank you, Bogdan Sataric email: bogdan

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-05 Thread Tom Rosmond
Actually, you are not the first to encounter the problem with 'MPI_Type_indexed' for very large datatypes. I also run with a 1.6 release, and solved the problem by switching to 'MPI_Type_Create_Hindexed' for the datatype. The critical difference is that the displacements for 'MPI_type_indexed' ar

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-05 Thread George Bosilca
Bogdan, As far as I can tell your code is correct, and the problem is coming from Open MPI. More specifically, I used alloca in the optimization stage in MPI_Type_commit, and as your arrays of length were too large, alloca failed and lead to a segfault. I fixed in the trunk (3c489ea), and this wil

[OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-05 Thread Bogdan Sataric
I've been having problems with my 3D matrix transpose program. I'm using MPI_Type_indexed in order to allign specific blocks that I want to send and receive across one or multiple nodes of a cluster. Up to few days ago I was able to run my program without any errors. However several test cases on t

Re: [OMPI users] Segmentation fault when using CUDA Aware feature

2015-01-12 Thread Rolf vandeVaart
Subject: RE: [OMPI users] Segmentation fault when using CUDA Aware feature That is strange, not sure why that is happening. I will try to reproduce with your program on my system. Also, perhaps you could rerun with –mca mpi_common_cuda_verbose 100 and send me that output. Thanks From: users

Re: [OMPI users] Segmentation fault when using CUDA Aware feature

2015-01-12 Thread Rolf vandeVaart
, January 11, 2015 11:41 PM To: us...@open-mpi.org Subject: [OMPI users] Segmentation fault when using CUDA Aware feature Hi, The OpenMpi I used is 1.8.4. I just tried to run a test program to see if the CUDA aware feature works. But I got the following errors. ss@ss-Inspiron-5439:~/cuda-workspace

[OMPI users] Segmentation fault when using CUDA Aware feature

2015-01-11 Thread Xun Gong
Hi, The OpenMpi I used is 1.8.4. I just tried to run a test program to see if the CUDA aware feature works. But I got the following errors. ss@ss-Inspiron-5439:~/cuda-workspace/cuda_mpi_ex1$ mpirun -np 2 s1 [ss-Inspiron-5439:32514] *** Process received signal *** [ss-Inspiron-5439:32514] Signal:

[OMPI users] segmentation fault for Java in openmpi-1.9a1r32669 on Solaris 10 Sparc

2014-09-05 Thread Siegmar Gross
Hi, today I installed openmpi-1.9a1r32669 on my machines (Solaris 10 Sparc (tyr), Solaris 10 x86_64 (sunpc1), and openSUSE Linux 12.1 x86_64 (linpc1)) with Sun C 5.12 and gcc-4.9.0. I get the following segmentation fault for my Sun C version on Solaris Sparc for Java programs. tyr java 137 ompi_

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Ralph Castain
Indeed odd - I'm afraid that this is just the kind of case that has been causing problems. I think I've figured out the problem, but have been buried with my "day job" for the last few weeks and unable to pursue it. On Aug 18, 2014, at 11:10 AM, Maxime Boissonneault wrote: > Ok, I confirm th

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Maxime Boissonneault
Ok, I confirm that with mpiexec -mca oob_tcp_if_include lo ring_c it works. It also works with mpiexec -mca oob_tcp_if_include ib0 ring_c We have 4 interfaces on this node. - lo, the local loop - ib0, infiniband - eth2, a management network - eth3, the public network It seems that mpiexec atte

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Ralph Castain
Yeah, there are some issues with the internal connection logic that need to get fixed. We haven't had many cases where it's been an issue, but a couple like this have cropped up - enough that I need to set aside some time to fix it. My apologies for the problem. On Aug 18, 2014, at 10:31 AM, M

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Maxime Boissonneault
Indeed, that makes sense now. Why isn't OpenMPI attempting to connect with the local loop for same node ? This used to work with 1.6.5. Maxime Le 2014-08-18 13:11, Ralph Castain a écrit : Yep, that pinpointed the problem: [helios-login1:28558] [[63019,1],0] tcp:send_handler CONNECTING [heli

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Ralph Castain
Yep, that pinpointed the problem: [helios-login1:28558] [[63019,1],0] tcp:send_handler CONNECTING [helios-login1:28558] [[63019,1],0]:tcp:complete_connect called for peer [[63019,0],0] on socket 11 [helios-login1:28558] [[63019,1],0]-[[63019,0],0] tcp_peer_complete_connect: connection failed: Co

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Maxime Boissonneault
Here it is. Maxime Le 2014-08-18 12:59, Ralph Castain a écrit : Ah...now that showed the problem. To pinpoint it better, please add -mca oob_base_verbose 10 and I think we'll have it On Aug 18, 2014, at 9:54 AM, Maxime Boissonneault wrote: This is all one one node indeed. Attached is th

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Ralph Castain
Ah...now that showed the problem. To pinpoint it better, please add -mca oob_base_verbose 10 and I think we'll have it On Aug 18, 2014, at 9:54 AM, Maxime Boissonneault wrote: > This is all one one node indeed. > > Attached is the output of > mpirun -np 4 --mca plm_base_verbose 10 -mca odls_

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Maxime Boissonneault
This is all one one node indeed. Attached is the output of mpirun -np 4 --mca plm_base_verbose 10 -mca odls_base_verbose 5 -mca state_base_verbose 5 -mca errmgr_base_verbose 5 ring_c |& tee output_ringc_verbose.txt Maxime Le 2014-08-18 12:48, Ralph Castain a écrit : This is all on one nod

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Ralph Castain
This is all on one node, yes? Try adding the following: -mca odls_base_verbose 5 -mca state_base_verbose 5 -mca errmgr_base_verbose 5 Lot of garbage, but should tell us what is going on. On Aug 18, 2014, at 9:36 AM, Maxime Boissonneault wrote: > Here it is > Le 2014-08-18 12:30, Joshua Ladd

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Maxime Boissonneault
Here it is Le 2014-08-18 12:30, Joshua Ladd a écrit : mpirun -np 4 --mca plm_base_verbose 10 [mboisson@helios-login1 examples]$ mpirun -np 4 --mca plm_base_verbose 10 ring_c [helios-login1:27853] mca: base: components_register: registering plm components [helios-login1:27853] mca: base: compone

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Joshua Ladd
Maxime, Can you run with: mpirun -np 4 --mca plm_base_verbose 10 /path/to/examples//ring_c On Mon, Aug 18, 2014 at 12:21 PM, Maxime Boissonneault < maxime.boissonnea...@calculquebec.ca> wrote: > Hi, > I just did compile without Cuda, and the result is the same. No output, > exits with code

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Maxime Boissonneault
Hi, I just did compile without Cuda, and the result is the same. No output, exits with code 65. [mboisson@helios-login1 examples]$ ldd ring_c linux-vdso.so.1 => (0x7fff3ab31000) libmpi.so.1 => /software-gpu/mpi/openmpi/1.8.2rc4_gcc4.8_nocuda/lib/libmpi.so.1 (0x7fab9ec

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-16 Thread Maxime Boissonneault
There is indeed also a problem with MPI + Cuda. This problem however is deeper, since it happens with Mvapich2 1.9, OpenMPI 1.6.5/1.8.1/1.8.2rc4, Cuda 5.5.22/6.0.37. From my tests, everything works fine with MPI + Cuda on a single node, but as soon as I got to MPI + Cuda accross nodes, I get s

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-16 Thread Jeff Squyres (jsquyres)
Just out of curiosity, I saw that one of the segv stack traces involved the cuda stack. Can you try a build without CUDA and see if that resolves the problem? On Aug 15, 2014, at 6:47 PM, Maxime Boissonneault wrote: > Hi Jeff, > > Le 2014-08-15 17:50, Jeff Squyres (jsquyres) a écrit : >> O

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-15 Thread Maxime Boissonneault
Hi Jeff, Le 2014-08-15 17:50, Jeff Squyres (jsquyres) a écrit : On Aug 15, 2014, at 5:39 PM, Maxime Boissonneault wrote: Correct. Can it be because torque (pbs_mom) is not running on the head node and mpiexec attempts to contact it ? Not for Open MPI's mpiexec, no. Open MPI's mpiexec (mp

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-15 Thread Jeff Squyres (jsquyres)
On Aug 15, 2014, at 5:39 PM, Maxime Boissonneault wrote: > Correct. > > Can it be because torque (pbs_mom) is not running on the head node and > mpiexec attempts to contact it ? Not for Open MPI's mpiexec, no. Open MPI's mpiexec (mpirun -- they're the same to us) will only try to use TM stu

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-15 Thread Maxime Boissonneault
Correct. Can it be because torque (pbs_mom) is not running on the head node and mpiexec attempts to contact it ? Maxime Le 2014-08-15 17:31, Joshua Ladd a écrit : But OMPI 1.8.x does run the ring_c program successfully on your compute node, right? The error only happens on the front-end logi

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-15 Thread Joshua Ladd
But OMPI 1.8.x does run the ring_c program successfully on your compute node, right? The error only happens on the front-end login node if I understood you correctly. Josh On Fri, Aug 15, 2014 at 5:20 PM, Maxime Boissonneault < maxime.boissonnea...@calculquebec.ca> wrote: > Here are the reques

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-15 Thread Maxime Boissonneault
Here are the requested files. In the archive, you will find the output of configure, make, make install as well as the config.log, the environment when running ring_c and the ompi_info --all. Just for a reminder, the ring_c example compiled and ran, but produced no output when running and ex

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-15 Thread Maxime Boissonneault
Hi, I solved the warning that appeared with OpenMPI 1.6.5 on the login node. I increased the registrable memory. Now, with OpenMPI 1.6.5, it does not give any warning. Yet, with OpenMPI 1.8.1 and OpenMPI 1.8.2rc4, it still exits with error code 65 and does not produce the normal output. I w

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-15 Thread Maxime Boissonneault
Hi Josh, The ring_c example does not work on our login node : [mboisson@helios-login1 examples]$ mpiexec -np 10 ring_c [mboisson@helios-login1 examples]$ echo $? 65 [mboisson@helios-login1 examples]$ echo $LD_LIBRARY_PATH /software-gpu/mpi/openmpi/1.8.2rc4_gcc4.8_cuda6.0.37/lib:/usr/lib64/nvidia:

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Joshua Ladd
One more, Maxime, can you please make sure you've covered everything here: http://www.open-mpi.org/community/help/ Josh On Thu, Aug 14, 2014 at 3:18 PM, Joshua Ladd wrote: > And maybe include your LD_LIBRARY_PATH > > Josh > > > On Thu, Aug 14, 2014 at 3:16 PM, Joshua Ladd wrote: > >> Can you

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Joshua Ladd
And maybe include your LD_LIBRARY_PATH Josh On Thu, Aug 14, 2014 at 3:16 PM, Joshua Ladd wrote: > Can you try to run the example code "ring_c" across nodes? > > Josh > > > On Thu, Aug 14, 2014 at 3:14 PM, Maxime Boissonneault < > maxime.boissonnea...@calculquebec.ca> wrote: > >> Yes, >> Every

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Joshua Ladd
Can you try to run the example code "ring_c" across nodes? Josh On Thu, Aug 14, 2014 at 3:14 PM, Maxime Boissonneault < maxime.boissonnea...@calculquebec.ca> wrote: > Yes, > Everything has been built with GCC 4.8.x, although x might have changed > between the OpenMPI 1.8.1 build and the gromac

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Maxime Boissonneault
Yes, Everything has been built with GCC 4.8.x, although x might have changed between the OpenMPI 1.8.1 build and the gromacs build. For OpenMPI 1.8.2rc4 however, it was the exact same compiler for everything. Maxime Le 2014-08-14 14:57, Joshua Ladd a écrit : Hmmm...weird. Seems like maybe a m

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Joshua Ladd
Hmmm...weird. Seems like maybe a mismatch between libraries. Did you build OMPI with the same compiler as you did GROMACS/Charm++? I'm stealing this suggestion from an old Gromacs forum with essentially the same symptom: "Did you compile Open MPI and Gromacs with the same compiler (i.e. both gcc

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Maxime Boissonneault
I just tried Gromacs with two nodes. It crashes, but with a different error. I get [gpu-k20-13:142156] *** Process received signal *** [gpu-k20-13:142156] Signal: Segmentation fault (11) [gpu-k20-13:142156] Signal code: Address not mapped (1) [gpu-k20-13:142156] Failing at address: 0x8 [gpu-k20-1

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Joshua Ladd
What about between nodes? Since this is coming from the OpenIB BTL, would be good to check this. Do you know what the MPI thread level is set to when used with the Charm++ runtime? Is it MPI_THREAD_MULTIPLE? The OpenIB BTL is not thread safe. Josh On Thu, Aug 14, 2014 at 2:17 PM, Maxime Boisson

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Maxime Boissonneault
Hi, I ran gromacs successfully with OpenMPI 1.8.1 and Cuda 6.0.37 on a single node, with 8 ranks and multiple OpenMP threads. Maxime Le 2014-08-14 14:15, Joshua Ladd a écrit : Hi, Maxime Just curious, are you able to run a vanilla MPI program? Can you try one one of the example programs in

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Joshua Ladd
Hi, Maxime Just curious, are you able to run a vanilla MPI program? Can you try one one of the example programs in the "examples" subdirectory. Looks like a threading issue to me. Thanks, Josh

  1   2   3   4   >