Re: [OMPI users] OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-28 Thread Ralph Castain

> On Oct 27, 2014, at 7:21 PM, Gilles Gouaillardet 
>  wrote:
> 
> Ralph,
> 
> On 2014/10/28 0:46, Ralph Castain wrote:
>> Actually, I propose to also remove that issue. Simple enough to use a
>> hash_table_32 to handle the jobids, and let that point to a
>> hash_table_32 of vpids. Since we rarely have more than one jobid
>> anyway, the memory overhead actually decreases with this model, and we
>> get rid of that annoying need to memcpy everything. 
> sounds good to me.
> from an implementation/performance point of view, should we put treat
> the local jobid differently ?
> (e.g. use a special variable for the hash_table_32 of the vpids of the
> current jobid)

Not entirely sure - let’s see as we go. My initial thought is “no”, but since 
the use of dynamic jobs is so rare, it might make sense.


>>> as far as i am concerned, i am fine with your proposed suggestion to
>>> dump opal_identifier_t.
>>> 
>>> about the patch, did you mean you have something ready i can apply to my
>>> PR ?
>>> or do you expect me to do the changes (i am ok to do it if needed)
>> Why don’t I grab your branch, create a separate repo based on it (just to 
>> keep things clean), push it to my area and give you write access? We can 
>> then collaborate on the changes and create a PR from there. This way, you 
>> don’t need to give me write access to your entire repo.
>> 
>> Make sense?
> ok to work on an other "somehow shared" repo for that issue.
> i am not convinced you should grab my branch since all the changes i
> made are will be no more valid.
> anyway, feel free to fork a repo from my branch or the master and i will
> work from here.

Okay, I’ll set something up tomorrow

> 
> Cheers,
> 
> Gilles
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25621.php



Re: [OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-28 Thread Gilles Gouaillardet
Marco,

here is attached a patch that fixes the issue
/* i could not find yet why this does not occurs on Linux ... */

could you please give it a try ?

Cheers,

Gilles

On 2014/10/27 18:45, Marco Atzeri wrote:
>
>
> On 10/27/2014 10:30 AM, Gilles Gouaillardet wrote:
>> Hi,
>>
>> i tested on a RedHat 6 like linux server and could not observe any
>> memory leak.
>>
>> BTW, are you running 32 or 64 bits cygwin ? and what is your configure
>> command line ?
>>
>> Thanks,
>>
>> Gilles
>>
>
> the problem is present in both versions.
>
> cygwin 1.8.3-1 packages  are built with configure:
>
>  --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin
> --sbindir=/usr/sbin --libexecdir=/usr/libexec --datadir=/usr/share
> --localstatedir=/var --sysconfdir=/etc --libdir=/usr/lib
> --datarootdir=/usr/share --docdir=/usr/share/doc/openmpi
> --htmldir=/usr/share/doc/openmpi/html -C
> LDFLAGS=-Wl,--export-all-symbols --disable-mca-dso
> --disable-sysv-shmem --enable-cxx-exceptions --with-threads=posix
> --without-cs-fs --with-mpi-param_check=always
> --enable-contrib-no-build=vt,libompitrace
> --enable-mca-no-build=paffinity,installdirs-windows,timer-windows,shmem-sysv
>
> Regards
> Marco
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/25604.php

diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c 
b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
index 7c8853f..c4a 100644
--- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c
+++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
@@ -16,6 +16,8 @@
  * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights
  * reserved.
  * Copyright (c) 2012  FUJITSU LIMITED.  All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -152,11 +154,16 @@ static void 
mca_pml_ob1_recv_request_construct(mca_pml_ob1_recv_request_t* reque
 OBJ_CONSTRUCT(&request->lock, opal_mutex_t);
 }

+static void mca_pml_ob1_recv_request_destruct(mca_pml_ob1_recv_request_t* 
request)
+{
+OBJ_DESTRUCT(&request->lock);
+}
+
 OBJ_CLASS_INSTANCE(
 mca_pml_ob1_recv_request_t,
 mca_pml_base_recv_request_t,
 mca_pml_ob1_recv_request_construct,
-NULL);
+mca_pml_ob1_recv_request_destruct);


 /*


[OMPI users] SIGBUS in openmpi-dev-178-ga16c1e4 on Solaris 10 Sparc

2014-10-28 Thread Siegmar Gross
Hi,

today I installed openmpi-dev-178-ga16c1e4 on Solaris 10 Sparc
with gcc-4.9.1 and Java 8. Now a very simple Java program works
as expected, but other Java programs still break. I removed the
warnings about "shmem.jar" and used the following configure
command.

tyr openmpi-dev-178-ga16c1e4-SunOS.sparc.64_gcc 406 head config.log \
  | grep openmpi
$ ../openmpi-dev-178-ga16c1e4/configure
  --prefix=/usr/local/openmpi-1.9.0_64_gcc
  --libdir=/usr/local/openmpi-1.9.0_64_gcc/lib64
  --with-jdk-bindir=/usr/local/jdk1.8.0/bin
  --with-jdk-headers=/usr/local/jdk1.8.0/include
  JAVA_HOME=/usr/local/jdk1.8.0
  LDFLAGS=-m64 CC=gcc CXX=g++ FC=gfortran CFLAGS=-m64 -D_REENTRANT
  CXXFLAGS=-m64 FCFLAGS=-m64 CPP=cpp CXXCPP=cpp
  CPPFLAGS= -D_REENTRANT CXXCPPFLAGS=
  --enable-mpi-cxx --enable-cxx-exceptions --enable-mpi-java
  --enable-mpi-thread-multiple --with-threads=posix
  --with-hwloc=internal
  --without-verbs --with-wrapper-cflags=-std=c11 -m64
  --with-wrapper-cxxflags=-m64 --enable-debug


tyr java 290 ompi_info | grep -e "Open MPI repo revision:" -e "C compiler 
version:"
  Open MPI repo revision: dev-178-ga16c1e4
  C compiler version: 4.9.1



> > regarding the BUS error reported by Siegmar, i also commited
> > 62bde1fcb554079143030bb305512c236672386f
> > in order to fix it (this is based on code review only, i have no sparc64
> > hardware to test it is enough)
> 
> I'll test it, when a new nightly snapshot is available for the trunk.


tyr java 291 mpijavac InitFinalizeMain.java 
tyr java 292 mpiexec -np 1 java InitFinalizeMain
Hello!

tyr java 293 mpijavac BcastIntMain.java 
tyr java 294 mpiexec -np 2 java BcastIntMain
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0xfffee3210bfc, pid=24792, tid=2
...



tyr java 296 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
...
(gdb) run -np 2 java BcastIntMain
Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 2 java 
BcastIntMain
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP2]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0xfffee3210bfc, pid=24814, tid=2
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode solaris-sparc 
compressed oops)
# Problematic frame:
# C  [mca_pmix_native.so+0x10bfc]  native_get_attr+0x3000
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid24814.log
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0xfffee3210bfc, pid=24812, tid=2
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode solaris-sparc 
compressed oops)
# Problematic frame:
# C  [mca_pmix_native.so+0x10bfc]  native_get_attr+0x3000
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid24812.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
[tyr:24814] *** Process received signal ***
[tyr:24814] Signal: Abort (6)
[tyr:24814] Signal code:  (-1)
/export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x2c
/export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:0xdc2d4
/lib/sparcv9/libc.so.1:0xd8b98
/lib/sparcv9/libc.so.1:0xcc70c
/lib/sparcv9/libc.so.1:0xcc918
/lib/sparcv9/libc.so.1:0xdd2d0 [ Signal 6 (ABRT)]
/lib/sparcv9/libc.so.1:_thr_sigsetmask+0x1c4
/lib/sparcv9/libc.so.1:sigprocmask+0x28
/lib/sparcv9/libc.so.1:_sigrelse+0x5c
/lib/sparcv9/libc.so.1:abort+0xc0
/export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:0xb3cb90
/export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:0xd97a04
/export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:JVM_handle_solaris_signal+0xc0c
/export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:0xb44e84
/lib/sparcv9/libc.so.1:0xd8b98
/lib/sparcv9/libc.so.1:0xcc70c
/lib/sparcv9/libc.so.1:0xcc918
/export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_pmix_native.so:0x10bfc
 [ Signal 10 (BUS)]
/export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_ess_pmi.so:0x33dc
/export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-rte.so.0.0.0:orte_init+0x67c
/export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi.so.0.0.0:ompi_mpi_init

Re: [OMPI users] OpenMPI 1.8.3 configure fails, Mac OS X 10.9.5, Intel Compilers

2014-10-28 Thread Jeff Squyres (jsquyres)
It sounds like your intel compiler installation is broken -- these types of 
"present but not compilable" kinds of errors usually indicate that the compiler 
itself has some kind of local conflict that is unrelated to Open MPI (that's 
why we put those tests in OMPI's configure -- so that we can detect such 
problems early/during configure vs. during the build of Open MPI itself).

Can you send all the information listed here:

http://www.open-mpi.org/community/help/



On Oct 27, 2014, at 2:06 PM, Bosler, Peter Andrew  wrote:

> Good morning,
> 
> I’m trying to build OpenMPI with the Intel 14.01 compilers with the following 
> configure line
> ./configure --prefix=/opt/openmpi-1.8.3/intel-14.01 CC=icc CXX=icpc FC=ifort
> On a 6-core 3.5 GHz Intel Xeon E5 Mac Pro running Mac OS X 10.9.5.  
> 
> Configure outputs a pthread error, complaining that different threads don’t 
> have the same PID.
> I also get the same error with OpenMPI 1.8.2 and the Intel compilers.   
> I was able to build OpenMPI 1.8.3 with both LLVM 5.1 and GCC 4.9 so something 
> is going wrong with the Intel compilers threading interface.  
> 
> Interestingly, OpenMPI 1.8.3 and the Intel 14.01 compilers work fine on my 
> Macbook pro : same OS, different CPU (2.8 Ghz Intel Core i7), same configure 
> line.
> 
> Is there an environment variable or configure option that I need to set to 
> avoid this error on the Mac Pro?
> 
> Thanks for your help.
> 
> Pete Bosler
> 
> P.S. The specific warnings and error from openmpi-1.8.3/configure are the 
> following (and the whole output file is attached):
> 
> … Lots of output …
> configure: WARNING: ulimit.h: present but cannot be compiled
> configure: WARNING: ulimit.h: check for missing prerequisite headers?
> configure: WARNING: ulimit.h: see the Autoconf documentation
> configure: WARNING: ulimit.h: section "Present But Cannot Be Compiled"
> configure: WARNING: ulimit.h: proceeding with the compiler's result
> configure: WARNING: ## 
> -- ##
> configure: WARNING: ## Report this to 
> http://www.open-mpi.org/community/help/ ##
> configure: WARNING: ## 
> -- ##
> … Lots more output …
> checking if threads have different pids (pthreads on linux)... yes
> configure: WARNING: This version of Open MPI only supports environments where
> configure: WARNING: threads have the same PID.  Please use an older version of
> configure: WARNING: Open MPI if you need support on systems with different
> configure: WARNING: PIDs for threads in the same process.  Open MPI 1.4.x
> configure: WARNING: supports such systems, as does at least some versions the
> configure: WARNING: Open MPI 1.5.x series.
> configure: error: Cannot continue
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25618.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] MPI_Init seems to hang, but works after a, minute or two

2014-10-28 Thread Jeff Squyres (jsquyres)
On Oct 27, 2014, at 1:25 PM, maxinator333  wrote:

> Deactivating my WLAN did indeed the trick!
> It also seems to not work, if a LAN-cable is plugged in. No difference if I 
> am correctly connected (to the internet/gateway) or not (wrong IP, e.g. 
> static given IP instead of mandatory DHCP)
> Again: deactivating the relevant LAN helps
> It seems, that in contrast to LAN, for WLAN it makes a difference, if I'm 
> connected to some network or not. If not connected, it seems to work, without 
> deactivating the whole hardware.

If you're only running on a single machine, you can deactivate the network 
transports in Open MPI and only used the shared memory transport.  That should 
allow you to run without deactivating any hardware.  E.g.

mpirun --mca btl sm,self ...

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] MPI_Init seems to hang, but works after a, minute or two

2014-10-28 Thread maxinator333

It doesn't seem to work. (switching off wlan still works)
mpicc mpiinit.c -o mpiinit.exe; time mpirun --mca btl sm,self -n 2 
./mpiinit.exe


real0m43.733s
user0m0.888s
sys 0m0.824s

Am 28.10.2014 13:40, schrieb Jeff Squyres (jsquyres):

On Oct 27, 2014, at 1:25 PM, maxinator333  wrote:


Deactivating my WLAN did indeed the trick!
It also seems to not work, if a LAN-cable is plugged in. No difference if I am 
correctly connected (to the internet/gateway) or not (wrong IP, e.g. static 
given IP instead of mandatory DHCP)
Again: deactivating the relevant LAN helps
It seems, that in contrast to LAN, for WLAN it makes a difference, if I'm 
connected to some network or not. If not connected, it seems to work, without 
deactivating the whole hardware.

If you're only running on a single machine, you can deactivate the network 
transports in Open MPI and only used the shared memory transport.  That should 
allow you to run without deactivating any hardware.  E.g.

 mpirun --mca btl sm,self ...





Re: [OMPI users] MPI_Init seems to hang, but works after a, minute or two

2014-10-28 Thread Jeff Squyres (jsquyres)
On Oct 28, 2014, at 9:02 AM, maxinator333  wrote:

> It doesn't seem to work. (switching off wlan still works)
> mpicc mpiinit.c -o mpiinit.exe; time mpirun --mca btl sm,self -n 2 
> ./mpiinit.exe
> 
> real0m43.733s
> user0m0.888s
> sys 0m0.824s

Ah, this must be an ORTE issue, then (i.e., the run-time system beneath the MPI 
layer).

Try specifying that ORTE should use the loopback interface:

mpirun --mca btl sm,self --mca oob_tcp_if_include lo ...

(actually, I don't know what the loopback interface is called on Windows; it's 
typically "lo" in Linux 2.6 kernels...)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] Java FAQ Page out of date

2014-10-28 Thread Jeff Squyres (jsquyres)
Thanks Brock; I opened https://github.com/open-mpi/ompi/issues/254 to track the 
issue.

On Oct 27, 2014, at 12:57 AM, Brock Palen  wrote:

> I think a lot of the information on this page:
> 
> http://www.open-mpi.org/faq/?category=java
> 
> Is out of date with the 1.8 release. 
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25594.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-28 Thread Marco Atzeri

On 10/28/2014 12:04 PM, Gilles Gouaillardet wrote:

Marco,

here is attached a patch that fixes the issue
/* i could not find yet why this does not occurs on Linux ... */

could you please give it a try ?

Cheers,

Gilles



It solves the issue on 64 bit.
I see no growing memory usage anymore

I will build 32 bit and then upload both as 1.8.3-2

Thanks
Marco



Re: [OMPI users] SIGBUS in openmpi-dev-178-ga16c1e4 on Solaris 10 Sparc

2014-10-28 Thread Gilles Gouaillardet
Hi Siegmar,

From the jvm logs, there is an alignment error in native_get_attr but i could 
not find it by reading the source code.

Could you please do
ulimit -c unlimited
mpiexec ...
and then
gdb /bin/java core
And run bt on all threads until you get a line number in native_get_attr

Thanks

Gilles

Siegmar Gross  wrote:
>Hi,
>
>today I installed openmpi-dev-178-ga16c1e4 on Solaris 10 Sparc
>with gcc-4.9.1 and Java 8. Now a very simple Java program works
>as expected, but other Java programs still break. I removed the
>warnings about "shmem.jar" and used the following configure
>command.
>
>tyr openmpi-dev-178-ga16c1e4-SunOS.sparc.64_gcc 406 head config.log \
>  | grep openmpi
>$ ../openmpi-dev-178-ga16c1e4/configure
>  --prefix=/usr/local/openmpi-1.9.0_64_gcc
>  --libdir=/usr/local/openmpi-1.9.0_64_gcc/lib64
>  --with-jdk-bindir=/usr/local/jdk1.8.0/bin
>  --with-jdk-headers=/usr/local/jdk1.8.0/include
>  JAVA_HOME=/usr/local/jdk1.8.0
>  LDFLAGS=-m64 CC=gcc CXX=g++ FC=gfortran CFLAGS=-m64 -D_REENTRANT
>  CXXFLAGS=-m64 FCFLAGS=-m64 CPP=cpp CXXCPP=cpp
>  CPPFLAGS= -D_REENTRANT CXXCPPFLAGS=
>  --enable-mpi-cxx --enable-cxx-exceptions --enable-mpi-java
>  --enable-mpi-thread-multiple --with-threads=posix
>  --with-hwloc=internal
>  --without-verbs --with-wrapper-cflags=-std=c11 -m64
>  --with-wrapper-cxxflags=-m64 --enable-debug
>
>
>tyr java 290 ompi_info | grep -e "Open MPI repo revision:" -e "C compiler 
>version:"
>  Open MPI repo revision: dev-178-ga16c1e4
>  C compiler version: 4.9.1
>
>
>
>> > regarding the BUS error reported by Siegmar, i also commited
>> > 62bde1fcb554079143030bb305512c236672386f
>> > in order to fix it (this is based on code review only, i have no sparc64
>> > hardware to test it is enough)
>> 
>> I'll test it, when a new nightly snapshot is available for the trunk.
>
>
>tyr java 291 mpijavac InitFinalizeMain.java 
>tyr java 292 mpiexec -np 1 java InitFinalizeMain
>Hello!
>
>tyr java 293 mpijavac BcastIntMain.java 
>tyr java 294 mpiexec -np 2 java BcastIntMain
>#
># A fatal error has been detected by the Java Runtime Environment:
>#
>#  SIGBUS (0xa) at pc=0xfffee3210bfc, pid=24792, tid=2
>...
>
>
>
>tyr java 296 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
>...
>(gdb) run -np 2 java BcastIntMain
>Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 2 java 
>BcastIntMain
>[Thread debugging using libthread_db enabled]
>[New Thread 1 (LWP 1)]
>[New LWP2]
>#
># A fatal error has been detected by the Java Runtime Environment:
>#
>#  SIGBUS (0xa) at pc=0xfffee3210bfc, pid=24814, tid=2
>#
># JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
># Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode 
>solaris-sparc compressed oops)
># Problematic frame:
># C  [mca_pmix_native.so+0x10bfc]  native_get_attr+0x3000
>#
># Failed to write core dump. Core dumps have been disabled. To enable core 
>dumping, try "ulimit -c unlimited" before starting Java again
>#
># An error report file with more information is saved as:
># /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid24814.log
>#
># A fatal error has been detected by the Java Runtime Environment:
>#
>#  SIGBUS (0xa) at pc=0xfffee3210bfc, pid=24812, tid=2
>#
># JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
># Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode 
>solaris-sparc compressed oops)
># Problematic frame:
># C  [mca_pmix_native.so+0x10bfc]  native_get_attr+0x3000
>#
># Failed to write core dump. Core dumps have been disabled. To enable core 
>dumping, try "ulimit -c unlimited" before starting Java again
>#
># An error report file with more information is saved as:
># /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid24812.log
>#
># If you would like to submit a bug report, please visit:
>#   http://bugreport.sun.com/bugreport/crash.jsp
># The crash happened outside the Java Virtual Machine in native code.
># See problematic frame for where to report the bug.
>#
>[tyr:24814] *** Process received signal ***
>[tyr:24814] Signal: Abort (6)
>[tyr:24814] Signal code:  (-1)
>/export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x2c
>/export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:0xdc2d4
>/lib/sparcv9/libc.so.1:0xd8b98
>/lib/sparcv9/libc.so.1:0xcc70c
>/lib/sparcv9/libc.so.1:0xcc918
>/lib/sparcv9/libc.so.1:0xdd2d0 [ Signal 6 (ABRT)]
>/lib/sparcv9/libc.so.1:_thr_sigsetmask+0x1c4
>/lib/sparcv9/libc.so.1:sigprocmask+0x28
>/lib/sparcv9/libc.so.1:_sigrelse+0x5c
>/lib/sparcv9/libc.so.1:abort+0xc0
>/export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:0xb3cb90
>/export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:0xd97a04
>/export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:JVM_handle_solaris_signal+0xc0c
>/export2/prog/SunOS_sparc/jdk1.8.0/jre/lib/sparcv9/server/libjvm.so:0xb44e84
>/lib/sparcv9

Re: [OMPI users] OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-28 Thread Gilles Gouaillardet
Thanks Marco,

pthread_mutex_init calls calloc under cygwin but does not allocate memory under 
linux, so not invoking pthread_mutex_destroy causes a memory leak only under 
cygwin.

Gilles

Marco Atzeri  wrote:
>On 10/28/2014 12:04 PM, Gilles Gouaillardet wrote:
>> Marco,
>>
>> here is attached a patch that fixes the issue
>> /* i could not find yet why this does not occurs on Linux ... */
>>
>> could you please give it a try ?
>>
>> Cheers,
>>
>> Gilles
>>
>
>It solves the issue on 64 bit.
>I see no growing memory usage anymore
>
>I will build 32 bit and then upload both as 1.8.3-2
>
>Thanks
>Marco
>
>___
>users mailing list
>us...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/10/25630.php


[OMPI users] Allgather in OpenMPI 1.4.3

2014-10-28 Thread Sebastian Rettenberger

Hi,

I know 1.4.3 is really old but I am currently stuck with it. However, 
there seems to be a bug in Allgather.


I have attached the source of an example program.

The output I would expect is:

rettenbs@hpcsccs4:/tmp$ mpiexec -np 5 ./a.out
0 0 1 2
1 0 1 2
2 0 1 2
3 0 1 2
4 0 1 2


But what I get is different results when I run the program multiple times:

rettenbs@hpcsccs4:/tmp$ mpiexec -np 5 ./a.out
0 0 1 2
1 0 1 2
2 0 1 2
3 2000 2001 2002
4 0 1 2
rettenbs@hpcsccs4:/tmp$ mpiexec -np 5 ./a.out
0 0 1 2
1 0 1 2
2 0 1 2
3 2000 2001 2002
4 3000 3001 3002


This bug is probably already fixed. Does anybody know in which version?

Best regards,
Sebastian

--
Sebastian Rettenberger, M.Sc.
Technische Universität München
Department of Informatics
Chair of Scientific Computing
Boltzmannstrasse 3, 85748 Garching, Germany
http://www5.in.tum.de/
#include 

#include 

int main(int argc, char* argv[])
{
	MPI_Init(&argc, &argv);

	int size, rank;
	MPI_Comm_rank(MPI_COMM_WORLD, &rank);

	int s[25];
	for (int i = 0; i < 25; i++)
		s[i] = rank*1000 + i;

	int r[500];
	MPI_Allgather(s, 25, MPI_INT, r, 25, MPI_INT, MPI_COMM_WORLD);

	std::cout << rank << ' ' << r[0] << ' ' << r[1] << ' ' << r[2] << std::endl;

	MPI_Finalize();
}


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-28 Thread Marco Atzeri

On 10/28/2014 4:41 PM, Gilles Gouaillardet wrote:

Thanks Marco,

pthread_mutex_init calls calloc under cygwin but does not allocate memory under 
linux, so not invoking pthread_mutex_destroy causes a memory leak only under 
cygwin.

Gilles


thanks for the work .

uploading 1.8.3-2 on www.cygwin.com

Regards
Marco


Re: [OMPI users] SIGBUS in openmpi-dev-178-ga16c1e4 on Solaris 10 Sparc

2014-10-28 Thread Siegmar Gross
Hi Gilles,

> From the jvm logs, there is an alignment error in native_get_attr
> but i could not find it by reading the source code.
> 
> Could you please do
> ulimit -c unlimited
> mpiexec ...
> and then
> gdb /bin/java core
> And run bt on all threads until you get a line number in native_get_attr

I found pmix_native.c:1131 in native_get_attr, attached gdb to the
Java process and set a breakpoint to this line. From there I single
stepped until I got SIGSEGV, so that you can see what happened.


(gdb) b pmix_native.c:1131
No source file named pmix_native.c.
Make breakpoint pending on future shared library load? (y or [n]) y

Breakpoint 1 (pmix_native.c:1131) pending.
(gdb) thread 14
[Switching to thread 14 (Thread 2 (LWP 2))]
#0  0x7eadc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
(gdb) f 3
#3  0xfffee5122230 in JNI_OnLoad (vm=0x7e57e9d8 , 
reserved=0x0)
at ../../../../../openmpi-dev-178-ga16c1e4/ompi/mpi/java/c/mpi_MPI.c:128
128 while (_dbg) poll(NULL, 0, 1);
(gdb) set _dbg=0
(gdb) c
Continuing.
[New LWP13]

Breakpoint 1, native_get_attr (attr=0xfffee2e05db0 "pmix.jobid", 
kv=0x7b4ff028)
at 
../../../../../openmpi-dev-178-ga16c1e4/opal/mca/pmix/native/pmix_native.c:1131
1131OPAL_OUTPUT_VERBOSE((1, 
opal_pmix_base_framework.framework_output,
(gdb) s
opal_proc_local_get () at ../../../openmpi-dev-178-ga16c1e4/opal/util/proc.c:80
80  return opal_proc_my_name;
(gdb) 
81  }
(gdb) 
_process_name_print_for_opal (procname=14259803799433510912)
at ../../openmpi-dev-178-ga16c1e4/orte/runtime/orte_init.c:64
64  orte_process_name_t* rte_name = (orte_process_name_t*)&procname;
(gdb) 
65  return ORTE_NAME_PRINT(rte_name);
(gdb) 
orte_util_print_name_args (name=0x7b4feb90)
at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:122
122 if (NULL == name) {
(gdb) 
142 job = orte_util_print_jobids(name->jobid);
(gdb) 
orte_util_print_jobids (job=3320119297)
at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:170
170 ptr = get_print_name_buffer();
(gdb) 
get_print_name_buffer ()
at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:92
92  if (!fns_init) {
(gdb) 
101 ret = opal_tsd_getspecific(print_args_tsd_key, (void**)&ptr);
(gdb) 
opal_tsd_getspecific (key=4, valuep=0x7b4fe8a0)
at ../../openmpi-dev-178-ga16c1e4/opal/threads/tsd.h:163
163 *valuep = pthread_getspecific(key);
(gdb) 
164 return OPAL_SUCCESS;
(gdb) 
165 }
(gdb) 
get_print_name_buffer ()
at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:102
102 if (OPAL_SUCCESS != ret) return NULL;
(gdb) 
104 if (NULL == ptr) {
(gdb) 
113 return (orte_print_args_buffers_t*) ptr;
(gdb) 
114 }
(gdb) 
orte_util_print_jobids (job=3320119297)
at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:172
172 if (NULL == ptr) {
(gdb) 
178 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb) 
179 ptr->cntr = 0;
(gdb) 
182 if (ORTE_JOBID_INVALID == job) {
(gdb) 
184 } else if (ORTE_JOBID_WILDCARD == job) {
(gdb) 
187 tmp1 = ORTE_JOB_FAMILY((unsigned long)job);
(gdb) 
188 tmp2 = ORTE_LOCAL_JOBID((unsigned long)job);
(gdb) 
189 snprintf(ptr->buffers[ptr->cntr++], 
(gdb) 
193 return ptr->buffers[ptr->cntr-1];
(gdb) 
194 }
(gdb) 
orte_util_print_name_args (name=0x7b4feb90)
at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:143
143 vpid = orte_util_print_vpids(name->vpid);
(gdb) 
orte_util_print_vpids (vpid=0)
at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:260
260 ptr = get_print_name_buffer();
(gdb) 
get_print_name_buffer ()
at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:92
92  if (!fns_init) {
(gdb) 
101 ret = opal_tsd_getspecific(print_args_tsd_key, (void**)&ptr);
(gdb) 
opal_tsd_getspecific (key=4, valuep=0x7b4fe8b0)
at ../../openmpi-dev-178-ga16c1e4/opal/threads/tsd.h:163
163 *valuep = pthread_getspecific(key);
(gdb) 
164 return OPAL_SUCCESS;
(gdb) 
165 }
(gdb) 
get_print_name_buffer ()
at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:102
102 if (OPAL_SUCCESS != ret) return NULL;
(gdb) 
104 if (NULL == ptr) {
(gdb) 
113 return (orte_print_args_buffers_t*) ptr;
(gdb) 
114 }
(gdb) 
orte_util_print_vpids (vpid=0)
at ../../openmpi-dev-178-ga16c1e4/orte/util/name_fns.c:262
262 if (NULL == ptr) {
(gdb) 
268 if (ORTE_PRINT_NAME_ARG_NUM_BUFS == ptr->cntr) {
(gdb) 
272 if (ORTE_VPID_INVALID == vpid) {
(gdb) 
274 } else if (ORTE_VPID_WILDCARD == vpid) {
(gdb) 
277 snprintf(ptr->buffers[ptr->cntr++], 
(gdb) 
281 return ptr->buffers[ptr->cntr-1];
(gdb) 
282 }
(gdb) 
orte_util_print_name_args (name=0x7b4feb90)
at ../../openmpi-dev-178-ga16c1e4/or

Re: [OMPI users] OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-28 Thread Ralph Castain
Gilles: will you be committing this to trunk and PR to 1.8?


> On Oct 28, 2014, at 11:05 AM, Marco Atzeri  wrote:
> 
> On 10/28/2014 4:41 PM, Gilles Gouaillardet wrote:
>> Thanks Marco,
>> 
>> pthread_mutex_init calls calloc under cygwin but does not allocate memory 
>> under linux, so not invoking pthread_mutex_destroy causes a memory leak only 
>> under cygwin.
>> 
>> Gilles
> 
> thanks for the work .
> 
> uploading 1.8.3-2 on www.cygwin.com
> 
> Regards
> Marco
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25634.php



Re: [OMPI users] OMPI users] OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-28 Thread Gilles Gouaillardet
Yep, will do today

Ralph Castain  wrote:
>Gilles: will you be committing this to trunk and PR to 1.8?
>
>
>> On Oct 28, 2014, at 11:05 AM, Marco Atzeri  wrote:
>> 
>> On 10/28/2014 4:41 PM, Gilles Gouaillardet wrote:
>>> Thanks Marco,
>>> 
>>> pthread_mutex_init calls calloc under cygwin but does not allocate memory 
>>> under linux, so not invoking pthread_mutex_destroy causes a memory leak 
>>> only under cygwin.
>>> 
>>> Gilles
>> 
>> thanks for the work .
>> 
>> uploading 1.8.3-2 on www.cygwin.com
>> 
>> Regards
>> Marco
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/25634.php
>
>___
>users mailing list
>us...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/10/25636.php