Re: [OMPI users] Segmentation fault

2023-08-09 Thread Aziz Ogutlu via users
3 10:08 AM *To:* Jeff Squyres (jsquyres) ; Open MPI Users *Subject:* Re: [OMPI users] Segmentation fault Hi Jeff, I also tried with OpenMPI 4.1.5, I got same error. On 8/9/23 17:05, Jeff Squyres (jsquyres) wrote: I'm afraid I don't know anything about the SU2 application. You are u

Re: [OMPI users] Segmentation fault

2023-08-09 Thread Jeff Squyres (jsquyres) via users
simple MPI application that replicates the issue? That would be something we could dig into and investigate. From: Aziz Ogutlu Sent: Wednesday, August 9, 2023 10:31 AM To: Jeff Squyres (jsquyres) ; Open MPI Users Subject: Re: [OMPI users] Segmentation fault Hi J

Re: [OMPI users] Segmentation fault

2023-08-09 Thread Jeff Squyres (jsquyres) via users
ation. From: Aziz Ogutlu Sent: Wednesday, August 9, 2023 10:08 AM To: Jeff Squyres (jsquyres) ; Open MPI Users Subject: Re: [OMPI users] Segmentation fault Hi Jeff, I also tried with OpenMPI 4.1.5, I got same error. On 8/9/23 17:05, Jeff Squyres (jsquyres) wrote: I'

Re: [OMPI users] Segmentation fault

2023-08-09 Thread Aziz Ogutlu via users
Hi Jeff, I also tried with OpenMPI 4.1.5, I got same error. On 8/9/23 17:05, Jeff Squyres (jsquyres) wrote: I'm afraid I don't know anything about the SU2 application. You are using Open MPI v4.0.3, which is fairly old.  Many bug fixes have been released since that version.  Can you upgrade

Re: [OMPI users] Segmentation fault

2023-08-09 Thread Jeff Squyres (jsquyres) via users
I'm afraid I don't know anything about the SU2 application. You are using Open MPI v4.0.3, which is fairly old. Many bug fixes have been released since that version. Can you upgrade to the latest version of Open MPI (v4.1.5)? From: users on behalf of Aziz Ogut

Re: [OMPI users] Segmentation fault when using 31 or 32 ranks

2019-07-10 Thread Raymond Arter via users
Jeff and Steven, Thanks for your help. I downloaded the nightly snapshot and it fixes the problem. I need to do more testing tomorrow and I will report back if any issues arise. Thanks again. T. On 10/07/2019 18:44, Jeff Squyres (jsquyres) via users wrote: It might be worth trying the lates

Re: [OMPI users] Segmentation fault when using 31 or 32 ranks

2019-07-10 Thread Jeff Squyres (jsquyres) via users
It might be worth trying the latest v4.0.x nightly snapshot -- we just updated the internal PMIx on the v4.0.x branch: https://www.open-mpi.org/nightly/v4.0.x/ > On Jul 10, 2019, at 1:29 PM, Steven Varga via users > wrote: > > Hi i am fighting similar. Did you try to update the pmix most

Re: [OMPI users] Segmentation fault when using 31 or 32 ranks

2019-07-10 Thread Steven Varga via users
Hi i am fighting similar. Did you try to update the pmix most recent 3.1.3 series release? On Wed, Jul 10, 2019, 12:24 Raymond Arter via users, < users@lists.open-mpi.org> wrote: > Hi, > > I have the following issue with version 4.0.1 when running on a node with > two 16 core CPUs (Intel Xeon Gol

Re: [OMPI users] Segmentation fault using openmpi-master-201901030305-ee26ed9

2019-01-04 Thread Howard Pritchard
Hi Sigmar, I observed this problem yesterday myself and should have a fix in to master later today. Howard Am Fr., 4. Jan. 2019 um 05:30 Uhr schrieb Siegmar Gross < siegmar.gr...@informatik.hs-fulda.de>: > Hi, > > I've installed (tried to install) openmpi-master-201901030305-ee26ed9 on > my "

Re: [OMPI users] Segmentation Fault when using OpenMPI 1.10.6 and PGI 17.1.0 on POWER8

2017-02-21 Thread r...@open-mpi.org
Can you provide a backtrace with line numbers from a debug build? We don’t get much testing with lsf, so it is quite possible there is a bug in there. > On Feb 21, 2017, at 7:39 PM, Hammond, Simon David (-EXP) > wrote: > > Hi OpenMPI Users, > > Has anyone successfully tested OpenMPI 1.10.6 wi

Re: [OMPI users] segmentation fault with openmpi-2.0.2rc2 on Linux

2017-01-03 Thread Siegmar Gross
Hi Howard, it still works with 4 processes and "vader" will not send the following output about missing communication peers if I start at least 2 processes. ... [loki:14965] select: initializing btl component vader [loki][[42444,1],0][../../../../../openmpi-2.0.2rc2/opal/mca/btl/vader/btl_vader_

Re: [OMPI users] segmentation fault with openmpi-2.0.2rc2 on Linux

2017-01-03 Thread Howard Pritchard
HI Siegmar, Could you please rerun the spawn_slave program with 4 processes? Your original traceback indicates a failure in the barrier in the slave program. I'm interested in seeing if when you run the slave program standalone with 4 processes the barrier failure is observed. Thanks, Howard

Re: [OMPI users] segmentation fault with openmpi-2.0.2rc2 on Linux

2017-01-02 Thread Siegmar Gross
Hi Howard, thank you very much that you try to solve my problem. I haven't changed the programs since 2013 so that you use the correct version. The program works as expected with the master trunk as you can see at the bottom of this email from my last mail. The slave program works when I launch i

Re: [OMPI users] segmentation fault with openmpi-2.0.2rc2 on Linux

2017-01-02 Thread Howard Pritchard
HI Siegmar, I've attempted to reproduce this using gnu compilers and the version of this test program(s) you posted earlier in 2016 but am unable to reproduce the problem. Could you double check that the slave program can be successfully run when launched directly by mpirun/mpiexec? It might also

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-12-23 Thread Howard Pritchard
Hi Paul, Thanks very much Christmas present. The Open MPI README has been updated to include a note about issues with the Intel 16.0.3-4 compiler suites. Enjoy the holidays, Howard 2016-12-23 3:41 GMT-07:00 Paul Kapinos : > Hi all, > > we discussed this issue with Intel compiler support and

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-12-23 Thread Paul Kapinos
Hi all, we discussed this issue with Intel compiler support and it looks like they now know what the issue is and how to protect after. It is a known issue resulting from a backwards incompatibility in an OS/glibc update, cf. https://sourceware.org/bugzilla/show_bug.cgi?id=20019 Affected ver

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-12-14 Thread Paul Kapinos
Hello all, we seem to run into the same issue: 'mpif90' sigsegvs immediately for Open MPI 1.10.4 compiled using Intel compilers 16.0.4.258 and 16.0.3.210, while it works fine when compiled with 16.0.2.181. It seems to be a compiler issue (more exactly: library issue on libs delivered with 16.

Re: [OMPI users] Segmentation fault (invalid mem ref) at MPI_StartAll (second call)

2016-11-25 Thread George Bosilca
At the first glance I would say you are confusing the variables counting your requests, reqcount and nrequests. George. On Fri, Nov 25, 2016 at 7:11 AM, Paolo Pezzutto wrote: > Dear all, > > I am struggling with an invalid memory reference when calling SUB EXC_MPI > (MOD01), and precisely at

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-27 Thread Siegmar Gross
Hi Ralph, Am 26.05.2016 um 17:38 schrieb Ralph Castain: I’m afraid I honestly can’t make any sense of it. It seems you at least have a simple workaround (use a hostfile instead of -host), yes? Only the combination "--host" and "--slot-list" breaks. Everything else works as expected. One more

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-26 Thread Ralph Castain
I’m afraid I honestly can’t make any sense of it. It seems you at least have a simple workaround (use a hostfile instead of -host), yes? > On May 26, 2016, at 5:48 AM, Siegmar Gross > wrote: > > Hi Ralph and Gilles, > > it's strange that the program works with "--host" and "--slot-list" > in

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-26 Thread Siegmar Gross
Hi Ralph and Gilles, it's strange that the program works with "--host" and "--slot-list" in your environment and not in mine. I get the following output, if I run the program in gdb without a breakpoint. loki spawn 142 gdb /usr/local/openmpi-1.10.3_64_gcc/bin/mpiexec GNU gdb (GDB; SUSE Linux En

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-25 Thread Siegmar Gross
Hi, I've updated to rc3 and have still the same error. Is the following output helpful to see, what's going on on my machine? loki spawn 145 gdb /usr/local/openmpi-1.10.3_64_gcc/bin/mpiexec GNU gdb (GDB; SUSE Linux Enterprise 12) 7.9.1 ... Reading symbols from /usr/local/openmpi-1.10.3_64_gcc/bi

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Ralph Castain
Works perfectly for me, so I believe this must be an environment issue - I am using gcc 6.0.0 on CentOS7 with x86: $ mpirun -n 1 -host bend001 --slot-list 0:0-1,1:0-1 --report-bindings ./simple_spawn [bend001:17599] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socke

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Siegmar Gross
Hi Ralph and Gilles, the program breaks only, if I combine "--host" and "--slot-list". Perhaps this information is helpful. I use a different machine now, so that you can see that the problem is not restricted to "loki". pc03 spawn 115 ompi_info | grep -e "OPAL repo revision:" -e "C compiler a

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Ralph Castain
> On May 24, 2016, at 6:21 AM, Siegmar Gross > wrote: > > Hi Ralph, > > I copy the relevant lines to this place, so that it is easier to see what > happens. "a.out" is your program, which I compiled with mpicc. > > >> loki spawn 153 ompi_info | grep -e "OPAL repo revision:" -e "C compiler > >

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Siegmar Gross
Hi Ralph, I copy the relevant lines to this place, so that it is easier to see what happens. "a.out" is your program, which I compiled with mpicc. >> loki spawn 153 ompi_info | grep -e "OPAL repo revision:" -e "C compiler >> absolute:" >> OPAL repo revision: v1.10.2-201-gd23dda8 >> C co

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Jeff Squyres (jsquyres)
On May 24, 2016, at 7:19 AM, Siegmar Gross wrote: > > I don't see a difference for my spawned processes, because both functions will > "wait" until all pending operations have finished, before the object will be > destroyed. Nevertheless, perhaps my small example program worked all the years > b

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Ralph Castain
> On May 24, 2016, at 4:19 AM, Siegmar Gross > wrote: > > Hi Ralph, > > thank you very much for your answer and your example program. > > On 05/23/16 17:45, Ralph Castain wrote: >> I cannot replicate the problem - both scenarios work fine for me. I’m not >> convinced your test code is correct

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Siegmar Gross
Hi Ralph, thank you very much for your answer and your example program. On 05/23/16 17:45, Ralph Castain wrote: I cannot replicate the problem - both scenarios work fine for me. I’m not convinced your test code is correct, however, as you call Comm_free the inter-communicator but didn’t call Co

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-23 Thread Ralph Castain
I cannot replicate the problem - both scenarios work fine for me. I’m not convinced your test code is correct, however, as you call Comm_free the inter-communicator but didn’t call Comm_disconnect. Checkout the attached for a correct code and see if it works for you. FWIW: I don’t know how many

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-09 Thread Giacomo Rossi
I've send you all the outputs from configure, make and make install commands... Today I've compiled openmpi with the latest gcc version (6.1.1) shipped with my archlinux distro and everything seems ok, so I think that the problem is with intel compiler. Giacomo Rossi Ph.D., Space Engineer Resear

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-06 Thread Dave Love
Gus Correa writes: > Hi Giacomo > > Some programs fail with segmentation fault > because the stack size is too small. Yes, the default for Intel Fortran is to allocate large-ish amounts on the stack, which may matter when the compiled program runs. However, look at the backtrace. It's apparent

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-06 Thread Jeff Squyres (jsquyres)
Ok, good. I asked that question because typically when we see errors like this, it is usually either a busted compiler installation or inadvertently mixing the run-times of multiple different compilers in some kind of incompatible way. Specifically, the mpifort (aka mpif90) application is a fa

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-06 Thread Giacomo Rossi
Yes, I've tried three simple "Hello world" programs in fortan, C and C++ and the compile and run with intel 16.0.3. The problem is with the openmpi compiled from source. Giacomo Rossi Ph.D., Space Engineer Research Fellow at Dept. of Mechanical and Aerospace Engineering, "Sapienza" University of

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Jeff Squyres (jsquyres)
Giacomo -- Are you able to run anything that is compiled by that Intel compiler installation? > On May 5, 2016, at 12:02 PM, Gus Correa wrote: > > Hi Giacomo > > Some programs fail with segmentation fault > because the stack size is too small. > [But others because of bugs in memory allocat

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Gus Correa
Hi Giacomo Some programs fail with segmentation fault because the stack size is too small. [But others because of bugs in memory allocation/management, etc.] Have you tried ulimit -s unlimited before you run the program? Are you using a single machine or a cluster? If you're using infiniband

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Giacomo Rossi
gdb /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90 GNU gdb (GDB) 7.11 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Gilles Gouaillardet
Giacomo, one option is to (if your shell is bash) ulimit -c unlimited mpif90 -v you should get a core file an other option is to gdb /.../mpif90 r -v bt Cheers, Gilles On Thursday, May 5, 2016, Giacomo Rossi wrote: > Here the result of ldd command: > 'ldd /opt/openmpi/1.10.2/intel/16.0.3/bin

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Giacomo Rossi
Here the result of ldd command: 'ldd /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90 linux-vdso.so.1 (0x7ffcacbbe000) libopen-pal.so.13 => /opt/openmpi/1.10.2/intel/16.0.3/lib/libopen-pal.so.13 (0x7fa9597a9000) libm.so.6 => /usr/lib/libm.so.6 (0x7fa9594a4000) libpciaccess.so.0 => /usr/lib/l

Re: [OMPI users] segmentation fault with java MPI

2016-01-25 Thread Gilles Gouaillardet
dbuf, filesize, MPI.BYTE); > Object test = null; > ByteArrayInputStream in = new ByteArrayInputStream(readbuf); > ObjectInputStream is = new ObjectInputStream(in); > System.out.println("Program fine until this line!"); > test = is.r

Re: [OMPI users] segmentation fault with java MPI

2016-01-25 Thread Marko Blatzheim
tInputStream(in);     System.out.println("Program fine until this line!");     test = is.readObject();     }     MPI.Finalize();     } }   Thanks Marko   Gesendet: Montag, 25. Januar 2016 um 01:04 Uhr Von: "Gilles Gouaillardet" An: "Open MPI Users&qu

Re: [OMPI users] segmentation fault with java MPI

2016-01-24 Thread Gilles Gouaillardet
Marko, i wrote a test program based on your code snippet and it works for me. could you please : - post a standalone test case that is ready to be compiled and ran - which version of OpenMPI are you using ? - which JVM are you using ? (vendor and version) - post your full command line Cheers,

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-07 Thread George Bosilca
Bogdan, The bug was solely related to the number of entries in the datatype, and not the number of elements nor the size/extent of the datatype. As such, 64 bits support was not impacted by this bug. >From the user perspective, the only visible improvement is the possibility to create datatypes w

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-06 Thread Bogdan Sataric
George thank you very much! So can I assume that new indexed type in 1.8.5 will support 64-bit large datatypes, or over the current 4GB datatypes (and some strange internal restrictions in my case)? Or is there any clue what will be the improvement over the existing datatype restrictions? Regards

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-05 Thread George Bosilca
On Thu, Mar 5, 2015 at 6:22 PM, Bogdan Sataric wrote: > Hello George, > > So is it safe for me to assume that my code is good and that you will > remove this bug from next OpenMPI version? > Yes I think it is safe to assume your code is correct (or at least it follows the specifications you desc

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-05 Thread Bogdan Sataric
Hello Tom, Actually I have tried using: MPI_Type_Create_Hindexed but the same problem persisted for the same matrix dimensions. Displacements array values are not a problem. Matrix of a size 800x640x480 creates type that is a bit less then 4GB large in case of complex datatype. It definitely fits

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-05 Thread Bogdan Sataric
Hello George, So is it safe for me to assume that my code is good and that you will remove this bug from next OpenMPI version? Also I would like to know which future OpenMPI version will incorporate this fix (so I can try my code in fixed version)? Thank you, Bogdan Sataric email: bogdan

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-05 Thread Tom Rosmond
Actually, you are not the first to encounter the problem with 'MPI_Type_indexed' for very large datatypes. I also run with a 1.6 release, and solved the problem by switching to 'MPI_Type_Create_Hindexed' for the datatype. The critical difference is that the displacements for 'MPI_type_indexed' ar

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

2015-03-05 Thread George Bosilca
Bogdan, As far as I can tell your code is correct, and the problem is coming from Open MPI. More specifically, I used alloca in the optimization stage in MPI_Type_commit, and as your arrays of length were too large, alloca failed and lead to a segfault. I fixed in the trunk (3c489ea), and this wil

Re: [OMPI users] Segmentation fault when using CUDA Aware feature

2015-01-12 Thread Rolf vandeVaart
Subject: RE: [OMPI users] Segmentation fault when using CUDA Aware feature That is strange, not sure why that is happening. I will try to reproduce with your program on my system. Also, perhaps you could rerun with –mca mpi_common_cuda_verbose 100 and send me that output. Thanks From: users

Re: [OMPI users] Segmentation fault when using CUDA Aware feature

2015-01-12 Thread Rolf vandeVaart
That is strange, not sure why that is happening. I will try to reproduce with your program on my system. Also, perhaps you could rerun with –mca mpi_common_cuda_verbose 100 and send me that output. Thanks From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Xun Gong Sent: Sunday, Janu

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Ralph Castain
Indeed odd - I'm afraid that this is just the kind of case that has been causing problems. I think I've figured out the problem, but have been buried with my "day job" for the last few weeks and unable to pursue it. On Aug 18, 2014, at 11:10 AM, Maxime Boissonneault wrote: > Ok, I confirm th

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Maxime Boissonneault
Ok, I confirm that with mpiexec -mca oob_tcp_if_include lo ring_c it works. It also works with mpiexec -mca oob_tcp_if_include ib0 ring_c We have 4 interfaces on this node. - lo, the local loop - ib0, infiniband - eth2, a management network - eth3, the public network It seems that mpiexec atte

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Ralph Castain
Yeah, there are some issues with the internal connection logic that need to get fixed. We haven't had many cases where it's been an issue, but a couple like this have cropped up - enough that I need to set aside some time to fix it. My apologies for the problem. On Aug 18, 2014, at 10:31 AM, M

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Maxime Boissonneault
Indeed, that makes sense now. Why isn't OpenMPI attempting to connect with the local loop for same node ? This used to work with 1.6.5. Maxime Le 2014-08-18 13:11, Ralph Castain a écrit : Yep, that pinpointed the problem: [helios-login1:28558] [[63019,1],0] tcp:send_handler CONNECTING [heli

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Ralph Castain
Yep, that pinpointed the problem: [helios-login1:28558] [[63019,1],0] tcp:send_handler CONNECTING [helios-login1:28558] [[63019,1],0]:tcp:complete_connect called for peer [[63019,0],0] on socket 11 [helios-login1:28558] [[63019,1],0]-[[63019,0],0] tcp_peer_complete_connect: connection failed: Co

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Maxime Boissonneault
Here it is. Maxime Le 2014-08-18 12:59, Ralph Castain a écrit : Ah...now that showed the problem. To pinpoint it better, please add -mca oob_base_verbose 10 and I think we'll have it On Aug 18, 2014, at 9:54 AM, Maxime Boissonneault wrote: This is all one one node indeed. Attached is th

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Ralph Castain
Ah...now that showed the problem. To pinpoint it better, please add -mca oob_base_verbose 10 and I think we'll have it On Aug 18, 2014, at 9:54 AM, Maxime Boissonneault wrote: > This is all one one node indeed. > > Attached is the output of > mpirun -np 4 --mca plm_base_verbose 10 -mca odls_

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Maxime Boissonneault
This is all one one node indeed. Attached is the output of mpirun -np 4 --mca plm_base_verbose 10 -mca odls_base_verbose 5 -mca state_base_verbose 5 -mca errmgr_base_verbose 5 ring_c |& tee output_ringc_verbose.txt Maxime Le 2014-08-18 12:48, Ralph Castain a écrit : This is all on one nod

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Ralph Castain
This is all on one node, yes? Try adding the following: -mca odls_base_verbose 5 -mca state_base_verbose 5 -mca errmgr_base_verbose 5 Lot of garbage, but should tell us what is going on. On Aug 18, 2014, at 9:36 AM, Maxime Boissonneault wrote: > Here it is > Le 2014-08-18 12:30, Joshua Ladd

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Maxime Boissonneault
Here it is Le 2014-08-18 12:30, Joshua Ladd a écrit : mpirun -np 4 --mca plm_base_verbose 10 [mboisson@helios-login1 examples]$ mpirun -np 4 --mca plm_base_verbose 10 ring_c [helios-login1:27853] mca: base: components_register: registering plm components [helios-login1:27853] mca: base: compone

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Joshua Ladd
Maxime, Can you run with: mpirun -np 4 --mca plm_base_verbose 10 /path/to/examples//ring_c On Mon, Aug 18, 2014 at 12:21 PM, Maxime Boissonneault < maxime.boissonnea...@calculquebec.ca> wrote: > Hi, > I just did compile without Cuda, and the result is the same. No output, > exits with code

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-18 Thread Maxime Boissonneault
Hi, I just did compile without Cuda, and the result is the same. No output, exits with code 65. [mboisson@helios-login1 examples]$ ldd ring_c linux-vdso.so.1 => (0x7fff3ab31000) libmpi.so.1 => /software-gpu/mpi/openmpi/1.8.2rc4_gcc4.8_nocuda/lib/libmpi.so.1 (0x7fab9ec

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-16 Thread Maxime Boissonneault
There is indeed also a problem with MPI + Cuda. This problem however is deeper, since it happens with Mvapich2 1.9, OpenMPI 1.6.5/1.8.1/1.8.2rc4, Cuda 5.5.22/6.0.37. From my tests, everything works fine with MPI + Cuda on a single node, but as soon as I got to MPI + Cuda accross nodes, I get s

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-16 Thread Jeff Squyres (jsquyres)
Just out of curiosity, I saw that one of the segv stack traces involved the cuda stack. Can you try a build without CUDA and see if that resolves the problem? On Aug 15, 2014, at 6:47 PM, Maxime Boissonneault wrote: > Hi Jeff, > > Le 2014-08-15 17:50, Jeff Squyres (jsquyres) a écrit : >> O

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-15 Thread Maxime Boissonneault
Hi Jeff, Le 2014-08-15 17:50, Jeff Squyres (jsquyres) a écrit : On Aug 15, 2014, at 5:39 PM, Maxime Boissonneault wrote: Correct. Can it be because torque (pbs_mom) is not running on the head node and mpiexec attempts to contact it ? Not for Open MPI's mpiexec, no. Open MPI's mpiexec (mp

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-15 Thread Jeff Squyres (jsquyres)
On Aug 15, 2014, at 5:39 PM, Maxime Boissonneault wrote: > Correct. > > Can it be because torque (pbs_mom) is not running on the head node and > mpiexec attempts to contact it ? Not for Open MPI's mpiexec, no. Open MPI's mpiexec (mpirun -- they're the same to us) will only try to use TM stu

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-15 Thread Maxime Boissonneault
Correct. Can it be because torque (pbs_mom) is not running on the head node and mpiexec attempts to contact it ? Maxime Le 2014-08-15 17:31, Joshua Ladd a écrit : But OMPI 1.8.x does run the ring_c program successfully on your compute node, right? The error only happens on the front-end logi

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-15 Thread Joshua Ladd
But OMPI 1.8.x does run the ring_c program successfully on your compute node, right? The error only happens on the front-end login node if I understood you correctly. Josh On Fri, Aug 15, 2014 at 5:20 PM, Maxime Boissonneault < maxime.boissonnea...@calculquebec.ca> wrote: > Here are the reques

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-15 Thread Maxime Boissonneault
Here are the requested files. In the archive, you will find the output of configure, make, make install as well as the config.log, the environment when running ring_c and the ompi_info --all. Just for a reminder, the ring_c example compiled and ran, but produced no output when running and ex

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-15 Thread Maxime Boissonneault
Hi, I solved the warning that appeared with OpenMPI 1.6.5 on the login node. I increased the registrable memory. Now, with OpenMPI 1.6.5, it does not give any warning. Yet, with OpenMPI 1.8.1 and OpenMPI 1.8.2rc4, it still exits with error code 65 and does not produce the normal output. I w

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-15 Thread Maxime Boissonneault
Hi Josh, The ring_c example does not work on our login node : [mboisson@helios-login1 examples]$ mpiexec -np 10 ring_c [mboisson@helios-login1 examples]$ echo $? 65 [mboisson@helios-login1 examples]$ echo $LD_LIBRARY_PATH /software-gpu/mpi/openmpi/1.8.2rc4_gcc4.8_cuda6.0.37/lib:/usr/lib64/nvidia:

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Joshua Ladd
One more, Maxime, can you please make sure you've covered everything here: http://www.open-mpi.org/community/help/ Josh On Thu, Aug 14, 2014 at 3:18 PM, Joshua Ladd wrote: > And maybe include your LD_LIBRARY_PATH > > Josh > > > On Thu, Aug 14, 2014 at 3:16 PM, Joshua Ladd wrote: > >> Can you

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Joshua Ladd
And maybe include your LD_LIBRARY_PATH Josh On Thu, Aug 14, 2014 at 3:16 PM, Joshua Ladd wrote: > Can you try to run the example code "ring_c" across nodes? > > Josh > > > On Thu, Aug 14, 2014 at 3:14 PM, Maxime Boissonneault < > maxime.boissonnea...@calculquebec.ca> wrote: > >> Yes, >> Every

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Joshua Ladd
Can you try to run the example code "ring_c" across nodes? Josh On Thu, Aug 14, 2014 at 3:14 PM, Maxime Boissonneault < maxime.boissonnea...@calculquebec.ca> wrote: > Yes, > Everything has been built with GCC 4.8.x, although x might have changed > between the OpenMPI 1.8.1 build and the gromac

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Maxime Boissonneault
Yes, Everything has been built with GCC 4.8.x, although x might have changed between the OpenMPI 1.8.1 build and the gromacs build. For OpenMPI 1.8.2rc4 however, it was the exact same compiler for everything. Maxime Le 2014-08-14 14:57, Joshua Ladd a écrit : Hmmm...weird. Seems like maybe a m

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Joshua Ladd
Hmmm...weird. Seems like maybe a mismatch between libraries. Did you build OMPI with the same compiler as you did GROMACS/Charm++? I'm stealing this suggestion from an old Gromacs forum with essentially the same symptom: "Did you compile Open MPI and Gromacs with the same compiler (i.e. both gcc

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Maxime Boissonneault
I just tried Gromacs with two nodes. It crashes, but with a different error. I get [gpu-k20-13:142156] *** Process received signal *** [gpu-k20-13:142156] Signal: Segmentation fault (11) [gpu-k20-13:142156] Signal code: Address not mapped (1) [gpu-k20-13:142156] Failing at address: 0x8 [gpu-k20-1

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Joshua Ladd
What about between nodes? Since this is coming from the OpenIB BTL, would be good to check this. Do you know what the MPI thread level is set to when used with the Charm++ runtime? Is it MPI_THREAD_MULTIPLE? The OpenIB BTL is not thread safe. Josh On Thu, Aug 14, 2014 at 2:17 PM, Maxime Boisson

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Maxime Boissonneault
Hi, I ran gromacs successfully with OpenMPI 1.8.1 and Cuda 6.0.37 on a single node, with 8 ranks and multiple OpenMP threads. Maxime Le 2014-08-14 14:15, Joshua Ladd a écrit : Hi, Maxime Just curious, are you able to run a vanilla MPI program? Can you try one one of the example programs in

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Joshua Ladd
Hi, Maxime Just curious, are you able to run a vanilla MPI program? Can you try one one of the example programs in the "examples" subdirectory. Looks like a threading issue to me. Thanks, Josh

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Maxime Boissonneault
Hi, I just did with 1.8.2rc4 and it does the same : [mboisson@helios-login1 simplearrayhello]$ ./hello [helios-login1:11739] *** Process received signal *** [helios-login1:11739] Signal: Segmentation fault (11) [helios-login1:11739] Signal code: Address not mapped (1) [helios-login1:11739] Failin

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Jeff Squyres (jsquyres)
Can you try the latest 1.8.2 rc tarball? (just released yesterday) http://www.open-mpi.org/software/ompi/v1.8/ On Aug 14, 2014, at 8:39 AM, Maxime Boissonneault wrote: > Hi, > I compiled Charm++ 6.6.0rc3 using > ./build charm++ mpi-linux-x86_64 smp --with-production > > When compiling

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-14 Thread Maxime Boissonneault
Note that if I do the same build with OpenMPI 1.6.5, it works flawlessly. Maxime Le 2014-08-14 08:39, Maxime Boissonneault a écrit : Hi, I compiled Charm++ 6.6.0rc3 using ./build charm++ mpi-linux-x86_64 smp --with-production When compiling the simple example mpi-linux-x86_64-smp/tests/charm+

Re: [OMPI users] Segmentation Fault

2014-03-21 Thread Jeff Squyres (jsquyres)
On Mar 21, 2014, at 3:26 AM, madhurima madhunapanthula wrote: > Iam trying to link the jumpshot libraries with the graph500 (mpi_tuned_2d > sources). > After linkin the libraries and executing mpirun with the > graph500_mpi_custome_n binaries Iam getting the following segmenation fault. Are y

Re: [OMPI users] Segmentation Fault

2014-03-21 Thread Madison Stemm
Hi Madhurima, I'm also having this issue. While I can't provide any assistance, I'd be interested in being kept abreast of any solution as it may assist me as well. ~Maddie On Fri, Mar 21, 2014 at 12:26 AM, madhurima madhunapanthula < erankima...@gmail.com> wrote: > > Hi, > > Iam trying to lin

Re: [OMPI users] Segmentation fault on OMPI 1.6.5 built with gcc 4.4.7 and PGI pgfortran 11.10

2013-12-24 Thread Jeff Hammond
Try pure PGI and pure GCC builds. If only the mixed one fails, then I saw a problem like this in MPICH a few months ago. It appears PGI does not play nice with GCC regarding the C standard library functions. Or at least that's what I concluded. The issue remains unresolved. Jeff Sent from my

Re: [OMPI users] Segmentation fault on OMPI 1.6.5 built with gcc 4.4.7 and PGI pgfortran 11.10

2013-12-24 Thread Jeff Squyres (jsquyres)
I'm *very loosely* checking email. :-) Agree with what Ralph said: it looks like your program called memalign, and that ended up segv'ing. That could be an OMPI problem, or it could be an application problem. Try also configuring OMPI --with-valgrind and running your app through a memory-che

Re: [OMPI users] Segmentation fault on OMPI 1.6.5 built with gcc 4.4.7 and PGI pgfortran 11.10

2013-12-23 Thread Ralph Castain
I fear that Jeff and Brian are both out for the holiday, Gus, so we are unlikely to have much info on this until they return I'm unaware of any such problems in 1.6.5. It looks like something isn't properly aligned in memory - could be an error on our part, but might be in the program. You migh

Re: [OMPI users] Segmentation fault in oob_tcp.c of openmpi-1.7.4a1r29646

2013-11-15 Thread tmishima
Further information. I first time encountered this problem in openmpi-1.7.4.x, while opnempi-1.7.3 and 1.6.x works fine. My directory below is "testbed-openmpi-1.7.3", but it's realy 1.7.4a1r29646. I'm sorry, if I confuse you. [mishima@manage testbed-openmpi-1.7.3]$ ompi_info | grep "Open MPI:

Re: [OMPI users] Segmentation fault in oob_tcp.c of openmpi-1.7.4a1r29646

2013-11-15 Thread Ralph Castain
Indeed it should - most puzzling. I'll try playing with it on slurm using sbatch and see if I get the same behavior. Offhand, I can't see why the difference would exist unless somehow the script itself is taking one of the execution slots, and somehow Torque is accounting for it. Will have to e

Re: [OMPI users] Segmentation fault in oob_tcp.c of openmpi-1.7.4a1r29646

2013-11-14 Thread tmishima
Hi Ralph, It's no problem that you let it lie until the problem becomes serious again. So, this is just an information for you. I agree with your opinion that the problem will lie in the modified hostfile. But exactly speaking, it's related to just adding -hostfile option to mpirun in Torque s

Re: [OMPI users] Segmentation fault in oob_tcp.c of openmpi-1.7.4a1r29646

2013-11-14 Thread Ralph Castain
On Nov 14, 2013, at 3:25 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > > I checked -cpus-per-proc in openmpi-1.7.4a1r29646. > It works well as I want to do, which can adjust nprocs > of each nodes dividing by number of threads. > > I think my problem is solved so far using -cpus-per

Re: [OMPI users] Segmentation fault in oob_tcp.c of openmpi-1.7.4a1r29646

2013-11-14 Thread tmishima
Hi Ralph, I checked -cpus-per-proc in openmpi-1.7.4a1r29646. It works well as I want to do, which can adjust nprocs of each nodes dividing by number of threads. I think my problem is solved so far using -cpus-per-proc, thank you very mush. Regarding oversbuscribed problem, I checked NPROCS was

Re: [OMPI users] Segmentation fault in oob_tcp.c of openmpi-1.7.4a1r29646

2013-11-14 Thread Ralph Castain
FWIW: I verified that this works fine under a slurm allocation of 2 nodes, each with 12 slots. I filled the node without getting an "oversbuscribed" error message [rhc@bend001 svn-trunk]$ mpirun -n 3 --bind-to core --cpus-per-proc 4 --report-bindings -hostfile hosts hostname [bend001:24318] MCW

Re: [OMPI users] Segmentation fault in oob_tcp.c of openmpi-1.7.4a1r29646

2013-11-14 Thread Ralph Castain
Also, you need to tell mpirun that the nodes aren't the same - add --hetero-nodes to your cmd line On Nov 13, 2013, at 10:14 PM, tmish...@jcity.maeda.co.jp wrote: > > > Thank you, Ralph! > > I didn't know that function of cups-per-proc. > As fas as I know, it didn't work in openmpi-1.6.x lik

Re: [OMPI users] Segmentation fault in oob_tcp.c of openmpi-1.7.4a1r29646

2013-11-14 Thread tmishima
Thank you, Ralph! I didn't know that function of cups-per-proc. As fas as I know, it didn't work in openmpi-1.6.x like that. It was just 4 cores binding... Today I don't have much time and I'll check it tomorrow. And thank you again for checking oversubscription problem. tmishima > Guess I d

Re: [OMPI users] Segmentation fault in oob_tcp.c of openmpi-1.7.4a1r29646

2013-11-13 Thread Ralph Castain
Guess I don't see why modifying the allocation is required - we have mapping options that should support such things. If you specify the total number of procs you want, and cpus-per-proc=4, it should do the same thing I would think. You'd get 2 procs on the 8 slot nodes, 8 on the 32 proc nodes,

Re: [OMPI users] Segmentation fault in oob_tcp.c of openmpi-1.7.4a1r29646

2013-11-13 Thread tmishima
Our cluster consists of three types of nodes. They have 8, 32 and 64 slots respectively. Since the performance of each core is almost same, mixed use of these nodes is possible. Furthremore, in this case, for hybrid application with openmpi+openmp, the modification of hostfile is necesarry as fo

Re: [OMPI users] Segmentation fault in oob_tcp.c of openmpi-1.7.4a1r29646

2013-11-13 Thread Ralph Castain
Why do it the hard way? I'll look at the FAQ because that definitely isn't a recommended thing to do - better to use -host to specify the subset, or just specify the desired mapping using all the various mappers we provide. On Nov 13, 2013, at 6:39 PM, tmish...@jcity.maeda.co.jp wrote: > > >

  1   2   3   >