Re: [OMPI users] OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris
> On Oct 27, 2014, at 7:21 PM, Gilles Gouaillardet > wrote: > > Ralph, > > On 2014/10/28 0:46, Ralph Castain wrote: >> Actually, I propose to also remove that issue. Simple enough to use a >> hash_table_32 to handle the jobids, and let that point to a >> hash_table_32 of vpids. Since we rarely have more than one jobid >> anyway, the memory overhead actually decreases with this model, and we >> get rid of that annoying need to memcpy everything. > sounds good to me. > from an implementation/performance point of view, should we put treat > the local jobid differently ? > (e.g. use a special variable for the hash_table_32 of the vpids of the > current jobid) Not entirely sure - let’s see as we go. My initial thought is “no”, but since the use of dynamic jobs is so rare, it might make sense. >>> as far as i am concerned, i am fine with your proposed suggestion to >>> dump opal_identifier_t. >>> >>> about the patch, did you mean you have something ready i can apply to my >>> PR ? >>> or do you expect me to do the changes (i am ok to do it if needed) >> Why don’t I grab your branch, create a separate repo based on it (just to >> keep things clean), push it to my area and give you write access? We can >> then collaborate on the changes and create a PR from there. This way, you >> don’t need to give me write access to your entire repo. >> >> Make sense? > ok to work on an other "somehow shared" repo for that issue. > i am not convinced you should grab my branch since all the changes i > made are will be no more valid. > anyway, feel free to fork a repo from my branch or the master and i will > work from here. Okay, I’ll set something up tomorrow > > Cheers, > > Gilles > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25621.php
Re: [OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?
Marco, here is attached a patch that fixes the issue /* i could not find yet why this does not occurs on Linux ... */ could you please give it a try ? Cheers, Gilles On 2014/10/27 18:45, Marco Atzeri wrote: > > > On 10/27/2014 10:30 AM, Gilles Gouaillardet wrote: >> Hi, >> >> i tested on a RedHat 6 like linux server and could not observe any >> memory leak. >> >> BTW, are you running 32 or 64 bits cygwin ? and what is your configure >> command line ? >> >> Thanks, >> >> Gilles >> > > the problem is present in both versions. > > cygwin 1.8.3-1 packages are built with configure: > > --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin > --sbindir=/usr/sbin --libexecdir=/usr/libexec --datadir=/usr/share > --localstatedir=/var --sysconfdir=/etc --libdir=/usr/lib > --datarootdir=/usr/share --docdir=/usr/share/doc/openmpi > --htmldir=/usr/share/doc/openmpi/html -C > LDFLAGS=-Wl,--export-all-symbols --disable-mca-dso > --disable-sysv-shmem --enable-cxx-exceptions --with-threads=posix > --without-cs-fs --with-mpi-param_check=always > --enable-contrib-no-build=vt,libompitrace > --enable-mca-no-build=paffinity,installdirs-windows,timer-windows,shmem-sysv > > Regards > Marco > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25604.php diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c b/ompi/mca/pml/ob1/pml_ob1_recvreq.c index 7c8853f..c4a 100644 --- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c +++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c @@ -16,6 +16,8 @@ * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2012 FUJITSU LIMITED. All rights reserved. + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -152,11 +154,16 @@ static void mca_pml_ob1_recv_request_construct(mca_pml_ob1_recv_request_t* reque OBJ_CONSTRUCT(&request->lock, opal_mutex_t); } +static void mca_pml_ob1_recv_request_destruct(mca_pml_ob1_recv_request_t* request) +{ +OBJ_DESTRUCT(&request->lock); +} + OBJ_CLASS_INSTANCE( mca_pml_ob1_recv_request_t, mca_pml_base_recv_request_t, mca_pml_ob1_recv_request_construct, -NULL); +mca_pml_ob1_recv_request_destruct); /*
[OMPI users] SIGBUS in openmpi-dev-178-ga16c1e4 on Solaris 10 Sparc
Hi, today I installed openmpi-dev-178-ga16c1e4 on Solaris 10 Sparc with gcc-4.9.1 and Java 8. Now a very simple Java program works as expected, but other Java programs still break. I removed the warnings about "shmem.jar" and used the following configure command. tyr openmpi-dev-178-ga16c1e4-SunOS.sparc.64_gcc 406 head config.log \ | grep openmpi $ ../openmpi-dev-178-ga16c1e4/configure --prefix=/usr/local/openmpi-1.9.0_64_gcc --libdir=/usr/local/openmpi-1.9.0_64_gcc/lib64 --with-jdk-bindir=/usr/local/jdk1.8.0/bin --with-jdk-headers=/usr/local/jdk1.8.0/include JAVA_HOME=/usr/local/jdk1.8.0 LDFLAGS=-m64 CC=gcc CXX=g++ FC=gfortran CFLAGS=-m64 -D_REENTRANT CXXFLAGS=-m64 FCFLAGS=-m64 CPP=cpp CXXCPP=cpp CPPFLAGS= -D_REENTRANT CXXCPPFLAGS= --enable-mpi-cxx --enable-cxx-exceptions --enable-mpi-java --enable-mpi-thread-multiple --with-threads=posix --with-hwloc=internal --without-verbs --with-wrapper-cflags=-std=c11 -m64 --with-wrapper-cxxflags=-m64 --enable-debug tyr java 290 ompi_info | grep -e "Open MPI repo revision:" -e "C compiler version:" Open MPI repo revision: dev-178-ga16c1e4 C compiler version: 4.9.1 > > regarding the BUS error reported by Siegmar, i also commited > > 62bde1fcb554079143030bb305512c236672386f > > in order to fix it (this is based on code review only, i have no sparc64 > > hardware to test it is enough) > > I'll test it, when a new nightly snapshot is available for the trunk. tyr java 291 mpijavac InitFinalizeMain.java tyr java 292 mpiexec -np 1 java InitFinalizeMain Hello! tyr java 293 mpijavac BcastIntMain.java tyr java 294 mpiexec -np 2 java BcastIntMain # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0xa) at pc=0xfffee3210bfc, pid=24792, tid=2 ... tyr java 296 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec ... (gdb) run -np 2 java BcastIntMain Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 2 java BcastIntMain [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [New LWP2] # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0xa) at pc=0xfffee3210bfc, pid=24814, tid=2 # # JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode solaris-sparc compressed oops) # Problematic frame: # C [mca_pmix_native.so+0x10bfc] native_get_attr+0x3000 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid24814.log # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0xa) at pc=0xfffee3210bfc, pid=24812, tid=2 # # JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode solaris-sparc compressed oops) # Problematic frame: # C [mca_pmix_native.so+0x10bfc] native_get_attr+0x3000 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid24812.log # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # [tyr:24814] *** Process received signal *** [tyr:24814] Signal: Abort (6) [tyr:24814] Signal code: (-1) /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x2c /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:0xdc2d4 /lib/sparcv9/libc.so.1:0xd8b98 /lib/sparcv9/libc.so.1:0xcc70c /lib/sparcv9/libc.so.1:0xcc918 /lib/sparcv9/libc.so.1:0xdd2d0 [ Signal 6 (ABRT)] /lib/sparcv9/libc.so.1:_thr_sigsetmask+0x1c4 /lib/sparcv9/libc.so.1:sigprocmask+0x28 /lib/sparcv9/libc.so.1:_sigrelse+0x5c /lib/sparcv9/libc.so.1:abort+0xc0 /export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:0xb3cb90 /export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:0xd97a04 /export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:JVM_handle_solaris_signal+0xc0c /export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:0xb44e84 /lib/sparcv9/libc.so.1:0xd8b98 /lib/sparcv9/libc.so.1:0xcc70c /lib/sparcv9/libc.so.1:0xcc918 /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_pmix_native.so:0x10bfc [ Signal 10 (BUS)] /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_ess_pmi.so:0x33dc /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-rte.so.0.0.0:orte_init+0x67c /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi.so.0.0.0:ompi_mpi_init
Re: [OMPI users] OpenMPI 1.8.3 configure fails, Mac OS X 10.9.5, Intel Compilers
It sounds like your intel compiler installation is broken -- these types of "present but not compilable" kinds of errors usually indicate that the compiler itself has some kind of local conflict that is unrelated to Open MPI (that's why we put those tests in OMPI's configure -- so that we can detect such problems early/during configure vs. during the build of Open MPI itself). Can you send all the information listed here: http://www.open-mpi.org/community/help/ On Oct 27, 2014, at 2:06 PM, Bosler, Peter Andrew wrote: > Good morning, > > I’m trying to build OpenMPI with the Intel 14.01 compilers with the following > configure line > ./configure --prefix=/opt/openmpi-1.8.3/intel-14.01 CC=icc CXX=icpc FC=ifort > On a 6-core 3.5 GHz Intel Xeon E5 Mac Pro running Mac OS X 10.9.5. > > Configure outputs a pthread error, complaining that different threads don’t > have the same PID. > I also get the same error with OpenMPI 1.8.2 and the Intel compilers. > I was able to build OpenMPI 1.8.3 with both LLVM 5.1 and GCC 4.9 so something > is going wrong with the Intel compilers threading interface. > > Interestingly, OpenMPI 1.8.3 and the Intel 14.01 compilers work fine on my > Macbook pro : same OS, different CPU (2.8 Ghz Intel Core i7), same configure > line. > > Is there an environment variable or configure option that I need to set to > avoid this error on the Mac Pro? > > Thanks for your help. > > Pete Bosler > > P.S. The specific warnings and error from openmpi-1.8.3/configure are the > following (and the whole output file is attached): > > … Lots of output … > configure: WARNING: ulimit.h: present but cannot be compiled > configure: WARNING: ulimit.h: check for missing prerequisite headers? > configure: WARNING: ulimit.h: see the Autoconf documentation > configure: WARNING: ulimit.h: section "Present But Cannot Be Compiled" > configure: WARNING: ulimit.h: proceeding with the compiler's result > configure: WARNING: ## > -- ## > configure: WARNING: ## Report this to > http://www.open-mpi.org/community/help/ ## > configure: WARNING: ## > -- ## > … Lots more output … > checking if threads have different pids (pthreads on linux)... yes > configure: WARNING: This version of Open MPI only supports environments where > configure: WARNING: threads have the same PID. Please use an older version of > configure: WARNING: Open MPI if you need support on systems with different > configure: WARNING: PIDs for threads in the same process. Open MPI 1.4.x > configure: WARNING: supports such systems, as does at least some versions the > configure: WARNING: Open MPI 1.5.x series. > configure: error: Cannot continue > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25618.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] MPI_Init seems to hang, but works after a, minute or two
On Oct 27, 2014, at 1:25 PM, maxinator333 wrote: > Deactivating my WLAN did indeed the trick! > It also seems to not work, if a LAN-cable is plugged in. No difference if I > am correctly connected (to the internet/gateway) or not (wrong IP, e.g. > static given IP instead of mandatory DHCP) > Again: deactivating the relevant LAN helps > It seems, that in contrast to LAN, for WLAN it makes a difference, if I'm > connected to some network or not. If not connected, it seems to work, without > deactivating the whole hardware. If you're only running on a single machine, you can deactivate the network transports in Open MPI and only used the shared memory transport. That should allow you to run without deactivating any hardware. E.g. mpirun --mca btl sm,self ... -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] MPI_Init seems to hang, but works after a, minute or two
It doesn't seem to work. (switching off wlan still works) mpicc mpiinit.c -o mpiinit.exe; time mpirun --mca btl sm,self -n 2 ./mpiinit.exe real0m43.733s user0m0.888s sys 0m0.824s Am 28.10.2014 13:40, schrieb Jeff Squyres (jsquyres): On Oct 27, 2014, at 1:25 PM, maxinator333 wrote: Deactivating my WLAN did indeed the trick! It also seems to not work, if a LAN-cable is plugged in. No difference if I am correctly connected (to the internet/gateway) or not (wrong IP, e.g. static given IP instead of mandatory DHCP) Again: deactivating the relevant LAN helps It seems, that in contrast to LAN, for WLAN it makes a difference, if I'm connected to some network or not. If not connected, it seems to work, without deactivating the whole hardware. If you're only running on a single machine, you can deactivate the network transports in Open MPI and only used the shared memory transport. That should allow you to run without deactivating any hardware. E.g. mpirun --mca btl sm,self ...
Re: [OMPI users] MPI_Init seems to hang, but works after a, minute or two
On Oct 28, 2014, at 9:02 AM, maxinator333 wrote: > It doesn't seem to work. (switching off wlan still works) > mpicc mpiinit.c -o mpiinit.exe; time mpirun --mca btl sm,self -n 2 > ./mpiinit.exe > > real0m43.733s > user0m0.888s > sys 0m0.824s Ah, this must be an ORTE issue, then (i.e., the run-time system beneath the MPI layer). Try specifying that ORTE should use the loopback interface: mpirun --mca btl sm,self --mca oob_tcp_if_include lo ... (actually, I don't know what the loopback interface is called on Windows; it's typically "lo" in Linux 2.6 kernels...) -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Java FAQ Page out of date
Thanks Brock; I opened https://github.com/open-mpi/ompi/issues/254 to track the issue. On Oct 27, 2014, at 12:57 AM, Brock Palen wrote: > I think a lot of the information on this page: > > http://www.open-mpi.org/faq/?category=java > > Is out of date with the 1.8 release. > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25594.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?
On 10/28/2014 12:04 PM, Gilles Gouaillardet wrote: Marco, here is attached a patch that fixes the issue /* i could not find yet why this does not occurs on Linux ... */ could you please give it a try ? Cheers, Gilles It solves the issue on 64 bit. I see no growing memory usage anymore I will build 32 bit and then upload both as 1.8.3-2 Thanks Marco
Re: [OMPI users] SIGBUS in openmpi-dev-178-ga16c1e4 on Solaris 10 Sparc
Hi Siegmar, From the jvm logs, there is an alignment error in native_get_attr but i could not find it by reading the source code. Could you please do ulimit -c unlimited mpiexec ... and then gdb /bin/java core And run bt on all threads until you get a line number in native_get_attr Thanks Gilles Siegmar Gross wrote: >Hi, > >today I installed openmpi-dev-178-ga16c1e4 on Solaris 10 Sparc >with gcc-4.9.1 and Java 8. Now a very simple Java program works >as expected, but other Java programs still break. I removed the >warnings about "shmem.jar" and used the following configure >command. > >tyr openmpi-dev-178-ga16c1e4-SunOS.sparc.64_gcc 406 head config.log \ > | grep openmpi >$ ../openmpi-dev-178-ga16c1e4/configure > --prefix=/usr/local/openmpi-1.9.0_64_gcc > --libdir=/usr/local/openmpi-1.9.0_64_gcc/lib64 > --with-jdk-bindir=/usr/local/jdk1.8.0/bin > --with-jdk-headers=/usr/local/jdk1.8.0/include > JAVA_HOME=/usr/local/jdk1.8.0 > LDFLAGS=-m64 CC=gcc CXX=g++ FC=gfortran CFLAGS=-m64 -D_REENTRANT > CXXFLAGS=-m64 FCFLAGS=-m64 CPP=cpp CXXCPP=cpp > CPPFLAGS= -D_REENTRANT CXXCPPFLAGS= > --enable-mpi-cxx --enable-cxx-exceptions --enable-mpi-java > --enable-mpi-thread-multiple --with-threads=posix > --with-hwloc=internal > --without-verbs --with-wrapper-cflags=-std=c11 -m64 > --with-wrapper-cxxflags=-m64 --enable-debug > > >tyr java 290 ompi_info | grep -e "Open MPI repo revision:" -e "C compiler >version:" > Open MPI repo revision: dev-178-ga16c1e4 > C compiler version: 4.9.1 > > > >> > regarding the BUS error reported by Siegmar, i also commited >> > 62bde1fcb554079143030bb305512c236672386f >> > in order to fix it (this is based on code review only, i have no sparc64 >> > hardware to test it is enough) >> >> I'll test it, when a new nightly snapshot is available for the trunk. > > >tyr java 291 mpijavac InitFinalizeMain.java >tyr java 292 mpiexec -np 1 java InitFinalizeMain >Hello! > >tyr java 293 mpijavac BcastIntMain.java >tyr java 294 mpiexec -np 2 java BcastIntMain ># ># A fatal error has been detected by the Java Runtime Environment: ># ># SIGBUS (0xa) at pc=0xfffee3210bfc, pid=24792, tid=2 >... > > > >tyr java 296 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec >... >(gdb) run -np 2 java BcastIntMain >Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 2 java >BcastIntMain >[Thread debugging using libthread_db enabled] >[New Thread 1 (LWP 1)] >[New LWP2] ># ># A fatal error has been detected by the Java Runtime Environment: ># ># SIGBUS (0xa) at pc=0xfffee3210bfc, pid=24814, tid=2 ># ># JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132) ># Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode >solaris-sparc compressed oops) ># Problematic frame: ># C [mca_pmix_native.so+0x10bfc] native_get_attr+0x3000 ># ># Failed to write core dump. Core dumps have been disabled. To enable core >dumping, try "ulimit -c unlimited" before starting Java again ># ># An error report file with more information is saved as: ># /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid24814.log ># ># A fatal error has been detected by the Java Runtime Environment: ># ># SIGBUS (0xa) at pc=0xfffee3210bfc, pid=24812, tid=2 ># ># JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132) ># Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode >solaris-sparc compressed oops) ># Problematic frame: ># C [mca_pmix_native.so+0x10bfc] native_get_attr+0x3000 ># ># Failed to write core dump. Core dumps have been disabled. To enable core >dumping, try "ulimit -c unlimited" before starting Java again ># ># An error report file with more information is saved as: ># /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid24812.log ># ># If you would like to submit a bug report, please visit: ># http://bugreport.sun.com/bugreport/crash.jsp ># The crash happened outside the Java Virtual Machine in native code. ># See problematic frame for where to report the bug. ># >[tyr:24814] *** Process received signal *** >[tyr:24814] Signal: Abort (6) >[tyr:24814] Signal code: (-1) >/export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x2c >/export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:0xdc2d4 >/lib/sparcv9/libc.so.1:0xd8b98 >/lib/sparcv9/libc.so.1:0xcc70c >/lib/sparcv9/libc.so.1:0xcc918 >/lib/sparcv9/libc.so.1:0xdd2d0 [ Signal 6 (ABRT)] >/lib/sparcv9/libc.so.1:_thr_sigsetmask+0x1c4 >/lib/sparcv9/libc.so.1:sigprocmask+0x28 >/lib/sparcv9/libc.so.1:_sigrelse+0x5c >/lib/sparcv9/libc.so.1:abort+0xc0 >/export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:0xb3cb90 >/export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:0xd97a04 >/export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:JVM_handle_solaris_signal+0xc0c >/export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:0xb44e84 >/lib/sparcv9
Re: [OMPI users] OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?
Thanks Marco, pthread_mutex_init calls calloc under cygwin but does not allocate memory under linux, so not invoking pthread_mutex_destroy causes a memory leak only under cygwin. Gilles Marco Atzeri wrote: >On 10/28/2014 12:04 PM, Gilles Gouaillardet wrote: >> Marco, >> >> here is attached a patch that fixes the issue >> /* i could not find yet why this does not occurs on Linux ... */ >> >> could you please give it a try ? >> >> Cheers, >> >> Gilles >> > >It solves the issue on 64 bit. >I see no growing memory usage anymore > >I will build 32 bit and then upload both as 1.8.3-2 > >Thanks >Marco > >___ >users mailing list >us...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >Link to this post: >http://www.open-mpi.org/community/lists/users/2014/10/25630.php
[OMPI users] Allgather in OpenMPI 1.4.3
Hi, I know 1.4.3 is really old but I am currently stuck with it. However, there seems to be a bug in Allgather. I have attached the source of an example program. The output I would expect is: rettenbs@hpcsccs4:/tmp$ mpiexec -np 5 ./a.out 0 0 1 2 1 0 1 2 2 0 1 2 3 0 1 2 4 0 1 2 But what I get is different results when I run the program multiple times: rettenbs@hpcsccs4:/tmp$ mpiexec -np 5 ./a.out 0 0 1 2 1 0 1 2 2 0 1 2 3 2000 2001 2002 4 0 1 2 rettenbs@hpcsccs4:/tmp$ mpiexec -np 5 ./a.out 0 0 1 2 1 0 1 2 2 0 1 2 3 2000 2001 2002 4 3000 3001 3002 This bug is probably already fixed. Does anybody know in which version? Best regards, Sebastian -- Sebastian Rettenberger, M.Sc. Technische Universität München Department of Informatics Chair of Scientific Computing Boltzmannstrasse 3, 85748 Garching, Germany http://www5.in.tum.de/ #include #include int main(int argc, char* argv[]) { MPI_Init(&argc, &argv); int size, rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); int s[25]; for (int i = 0; i < 25; i++) s[i] = rank*1000 + i; int r[500]; MPI_Allgather(s, 25, MPI_INT, r, 25, MPI_INT, MPI_COMM_WORLD); std::cout << rank << ' ' << r[0] << ' ' << r[1] << ' ' << r[2] << std::endl; MPI_Finalize(); } smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI users] OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?
On 10/28/2014 4:41 PM, Gilles Gouaillardet wrote: Thanks Marco, pthread_mutex_init calls calloc under cygwin but does not allocate memory under linux, so not invoking pthread_mutex_destroy causes a memory leak only under cygwin. Gilles thanks for the work . uploading 1.8.3-2 on www.cygwin.com Regards Marco
Re: [OMPI users] SIGBUS in openmpi-dev-178-ga16c1e4 on Solaris 10 Sparc
Hi Gilles, > From the jvm logs, there is an alignment error in native_get_attr > but i could not find it by reading the source code. > > Could you please do > ulimit -c unlimited > mpiexec ... > and then > gdb /bin/java core > And run bt on all threads until you get a line number in native_get_attr I found pmix_native.c:1131 in native_get_attr, attached gdb to the Java process and set a breakpoint to this line. From there I single stepped until I got SIGSEGV, so that you can see what happened. (gdb) b pmix_native.c:1131 No source file named pmix_native.c. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (pmix_native.c:1131) pending. (gdb) thread 14 [Switching to thread 14 (Thread 2 (LWP 2))] #0 0x7eadc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 (gdb) f 3 #3 0xfffee5122230 in JNI_OnLoad (vm=0x7e57e9d8 , reserved=0x0) at ../../../../../openmpi-dev-178-ga16c1e4/ompi/mpi/java/c/mpi_MPI.c:128 128 while (_dbg) poll(NULL, 0, 1); (gdb) set _dbg=0 (gdb) c Continuing. [New LWP13] Breakpoint 1, native_get_attr (attr=0xfffee2e05db0 "pmix.jobid", kv=0x7b4ff028) at ../../../../../openmpi-dev-178-ga16c1e4/opal/mca/pmix/native/pmix_native.c:1131 1131OPAL_OUTPUT_VERBOSE((1, opal_pmix_base_framework.framework_output, (gdb) s opal_proc_local_get () at ../../../openmpi-dev-178-ga16c1e4/opal/util/proc.c:80 80 return opal_proc_my_name; (gdb) 81 } (gdb) _process_name_print_for_opal (procname=14259803799433510912) at ../../openmpi-dev-178-ga16c1e4/orte/runtime/orte_init.c:64 64 orte_process_name_t* rte_name = (orte_process_name_t*)&procname; (gdb) 65 return ORTE_NAME_PRINT(rte_name); (gdb) orte_util_print_name_args (name=0x7b4feb90) at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:122 122 if (NULL == name) { (gdb) 142 job = orte_util_print_jobids(name->jobid); (gdb) orte_util_print_jobids (job=3320119297) at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:170 170 ptr = get_print_name_buffer(); (gdb) get_print_name_buffer () at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:92 92 if (!fns_init) { (gdb) 101 ret = opal_tsd_getspecific(print_args_tsd_key, (void**)&ptr); (gdb) opal_tsd_getspecific (key=4, valuep=0x7b4fe8a0) at ../../openmpi-dev-178-ga16c1e4/opal/threads/tsd.h:163 163 *valuep = pthread_getspecific(key); (gdb) 164 return OPAL_SUCCESS; (gdb) 165 } (gdb) get_print_name_buffer () at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:102 102 if (OPAL_SUCCESS != ret) return NULL; (gdb) 104 if (NULL == ptr) { (gdb) 113 return (orte_print_args_buffers_t*) ptr; (gdb) 114 } (gdb) orte_util_print_jobids (job=3320119297) at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:172 172 if (NULL == ptr) { (gdb) 178 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) { (gdb) 179 ptr->cntr = 0; (gdb) 182 if (ORTE_JOBID_INVALID == job) { (gdb) 184 } else if (ORTE_JOBID_WILDCARD == job) { (gdb) 187 tmp1 = ORTE_JOB_FAMILY((unsigned long)job); (gdb) 188 tmp2 = ORTE_LOCAL_JOBID((unsigned long)job); (gdb) 189 snprintf(ptr->buffers[ptr->cntr++], (gdb) 193 return ptr->buffers[ptr->cntr-1]; (gdb) 194 } (gdb) orte_util_print_name_args (name=0x7b4feb90) at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:143 143 vpid = orte_util_print_vpids(name->vpid); (gdb) orte_util_print_vpids (vpid=0) at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:260 260 ptr = get_print_name_buffer(); (gdb) get_print_name_buffer () at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:92 92 if (!fns_init) { (gdb) 101 ret = opal_tsd_getspecific(print_args_tsd_key, (void**)&ptr); (gdb) opal_tsd_getspecific (key=4, valuep=0x7b4fe8b0) at ../../openmpi-dev-178-ga16c1e4/opal/threads/tsd.h:163 163 *valuep = pthread_getspecific(key); (gdb) 164 return OPAL_SUCCESS; (gdb) 165 } (gdb) get_print_name_buffer () at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:102 102 if (OPAL_SUCCESS != ret) return NULL; (gdb) 104 if (NULL == ptr) { (gdb) 113 return (orte_print_args_buffers_t*) ptr; (gdb) 114 } (gdb) orte_util_print_vpids (vpid=0) at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:262 262 if (NULL == ptr) { (gdb) 268 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) { (gdb) 272 if (ORTE_VPID_INVALID == vpid) { (gdb) 274 } else if (ORTE_VPID_WILDCARD == vpid) { (gdb) 277 snprintf(ptr->buffers[ptr->cntr++], (gdb) 281 return ptr->buffers[ptr->cntr-1]; (gdb) 282 } (gdb) orte_util_print_name_args (name=0x7b4feb90) at ../../openmpi-dev-178-ga16c1e4/or
Re: [OMPI users] OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?
Gilles: will you be committing this to trunk and PR to 1.8? > On Oct 28, 2014, at 11:05 AM, Marco Atzeri wrote: > > On 10/28/2014 4:41 PM, Gilles Gouaillardet wrote: >> Thanks Marco, >> >> pthread_mutex_init calls calloc under cygwin but does not allocate memory >> under linux, so not invoking pthread_mutex_destroy causes a memory leak only >> under cygwin. >> >> Gilles > > thanks for the work . > > uploading 1.8.3-2 on www.cygwin.com > > Regards > Marco > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25634.php
Re: [OMPI users] OMPI users] OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?
Yep, will do today Ralph Castain wrote: >Gilles: will you be committing this to trunk and PR to 1.8? > > >> On Oct 28, 2014, at 11:05 AM, Marco Atzeri wrote: >> >> On 10/28/2014 4:41 PM, Gilles Gouaillardet wrote: >>> Thanks Marco, >>> >>> pthread_mutex_init calls calloc under cygwin but does not allocate memory >>> under linux, so not invoking pthread_mutex_destroy causes a memory leak >>> only under cygwin. >>> >>> Gilles >> >> thanks for the work . >> >> uploading 1.8.3-2 on www.cygwin.com >> >> Regards >> Marco >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/10/25634.php > >___ >users mailing list >us...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >Link to this post: >http://www.open-mpi.org/community/lists/users/2014/10/25636.php