the JVM sets its own signal handlers, and it is important openmpi dones not override them. this is what previously happened with PSM (infinipath) but this has been solved since. you might be linking with a third party library that hijacks signal handlers and cause the crash (which would explain why I cannot reproduce the issue)
the master branch has a revamped memory patcher (compared to v2.x or v1.10), and that could have some bad interactions with the JVM, so you might also give v2.x a try Cheers, Gilles On Friday, July 8, 2016, Gundram Leifert <gundram.leif...@uni-rostock.de> wrote: > You made the best of it... thanks a lot! > > Whithout MPI it runs. > Just adding MPI.init() causes the crash! > > maybe I installed something wrong... > > install newest automake, autoconf, m4, libtoolize in right order and same > prefix > check out ompi, > autogen > configure with same prefix, pointing to the same jdk, I later use > make > make install > > I will test some different configurations of ./configure... > > > On 07/08/2016 01:40 PM, Gilles Gouaillardet wrote: > > I am running out of ideas ... > > what if you do not run within slurm ? > what if you do not use '-cp executor.jar' > or what if you configure without --disable-dlopen --disable-mca-dso ? > > if you > mpirun -np 1 ... > then MPI_Bcast and MPI_Barrier are basically no-op, so it is really weird > your program is still crashing. an other test is to comment out MPI_Bcast > and MPI_Barrier and try again with -np 1 > > Cheers, > > Gilles > > On Friday, July 8, 2016, Gundram Leifert < > <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');> > gundram.leif...@uni-rostock.de > <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');>> wrote: > >> In any cases the same error. >> this is my code: >> >> salloc -n 3 >> export IPATH_NO_BACKTRACE >> ulimit -s 10240 >> mpirun -np 3 java -cp executor.jar >> de.uros.citlab.executor.test.TestSendBigFiles2 >> >> >> also for 1 or two cores, the process crashes. >> >> >> On 07/08/2016 12:32 PM, Gilles Gouaillardet wrote: >> >> you can try >> export IPATH_NO_BACKTRACE >> before invoking mpirun (that should not be needed though) >> >> an other test is to >> ulimit -s 10240 >> before invoking mpirun. >> >> btw, do you use mpirun or srun ? >> >> can you reproduce the crash with 1 or 2 tasks ? >> >> Cheers, >> >> Gilles >> >> On Friday, July 8, 2016, Gundram Leifert <gundram.leif...@uni-rostock.de >> <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');>> wrote: >> >>> Hello, >>> >>> configure: >>> ./configure --enable-mpi-java --with-jdk-dir=/home/gl069/bin/jdk1.7.0_25 >>> --disable-dlopen --disable-mca-dso >>> >>> >>> 1 node with 3 cores. I use SLURM to allocate one node. I changed --mem, >>> but it has no effect. >>> salloc -n 3 >>> >>> >>> core file size (blocks, -c) 0 >>> data seg size (kbytes, -d) unlimited >>> scheduling priority (-e) 0 >>> file size (blocks, -f) unlimited >>> pending signals (-i) 256564 >>> max locked memory (kbytes, -l) unlimited >>> max memory size (kbytes, -m) unlimited >>> open files (-n) 100000 >>> pipe size (512 bytes, -p) 8 >>> POSIX message queues (bytes, -q) 819200 >>> real-time priority (-r) 0 >>> stack size (kbytes, -s) unlimited >>> cpu time (seconds, -t) unlimited >>> max user processes (-u) 4096 >>> virtual memory (kbytes, -v) unlimited >>> file locks (-x) unlimited >>> >>> uname -a >>> Linux titan01.service 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 >>> 16:04:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >>> >>> cat /etc/system-release >>> CentOS Linux release 7.2.1511 (Core) >>> >>> what else do you need? >>> >>> Cheers, Gundram >>> >>> On 07/07/2016 10:05 AM, Gilles Gouaillardet wrote: >>> >>> Gundram, >>> >>> >>> can you please provide more information on your environment : >>> >>> - configure command line >>> >>> - OS >>> >>> - memory available >>> >>> - ulimit -a >>> >>> - number of nodes >>> >>> - number of tasks used >>> >>> - interconnect used (if any) >>> >>> - batch manager (if any) >>> >>> >>> Cheers, >>> >>> >>> Gilles >>> On 7/7/2016 4:17 PM, Gundram Leifert wrote: >>> >>> Hello Gilles, >>> >>> I tried you code and it crashes after 3-15 iterations (see (1)). It is >>> always the same error (only the "94" varies). >>> >>> Meanwhile I think Java and MPI use the same memory because when I delete >>> the hash-call, the program runs sometimes more than 9k iterations. >>> When it crashes, there are different lines (see (2) and (3)). The >>> crashes also occurs on rank 0. >>> >>> ##### (1)##### >>> # Problematic frame: >>> # J 94 C2 de.uros.citlab.executor.test.TestSendBigFiles2.hashcode([BI)I >>> (42 bytes) @ 0x00002b03242dc9c4 [0x00002b03242dc860+0x164] >>> >>> #####(2)##### >>> # Problematic frame: >>> # V [libjvm.so+0x68d0f6] >>> JavaCallWrapper::JavaCallWrapper(methodHandle, Handle, JavaValue*, >>> Thread*)+0xb6 >>> >>> #####(3)##### >>> # Problematic frame: >>> # V [libjvm.so+0x4183bf] >>> ThreadInVMfromNative::ThreadInVMfromNative(JavaThread*)+0x4f >>> >>> Any more idea? >>> >>> On 07/07/2016 03:00 AM, Gilles Gouaillardet wrote: >>> >>> Gundram, >>> >>> >>> fwiw, i cannot reproduce the issue on my box >>> >>> - centos 7 >>> >>> - java version "1.8.0_71" >>> Java(TM) SE Runtime Environment (build 1.8.0_71-b15) >>> Java HotSpot(TM) 64-Bit Server VM (build 25.71-b15, mixed mode) >>> >>> >>> i noticed on non zero rank saveMem is allocated at each iteration. >>> ideally, the garbage collector can take care of that and this should not >>> be an issue. >>> >>> would you mind giving the attached file a try ? >>> >>> Cheers, >>> >>> Gilles >>> >>> On 7/7/2016 7:41 AM, Gilles Gouaillardet wrote: >>> >>> I will have a look at it today >>> >>> how did you configure OpenMPI ? >>> >>> Cheers, >>> >>> Gilles >>> >>> On Thursday, July 7, 2016, Gundram Leifert < >>> gundram.leif...@uni-rostock.de> wrote: >>> >>>> Hello Giles, >>>> >>>> thank you for your hints! I did 3 changes, unfortunately the same error >>>> occures: >>>> >>>> update ompi: >>>> commit ae8444682f0a7aa158caea08800542ce9874455e >>>> Author: Ralph Castain <r...@open-mpi.org> >>>> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');> >>>> Date: Tue Jul 5 20:07:16 2016 -0700 >>>> >>>> update java: >>>> java version "1.8.0_92" >>>> Java(TM) SE Runtime Environment (build 1.8.0_92-b14) >>>> Java HotSpot(TM) Server VM (build 25.92-b14, mixed mode) >>>> >>>> delete hashcode-lines. >>>> >>>> Now I get this error message - to 100%, after different number of >>>> iterations (15-300): >>>> >>>> 0/ 3:length = 100000000 >>>> 0/ 3:bcast length done (length = 100000000) >>>> 1/ 3:bcast length done (length = 100000000) >>>> 2/ 3:bcast length done (length = 100000000) >>>> # >>>> # A fatal error has been detected by the Java Runtime Environment: >>>> # >>>> # SIGSEGV (0xb) at pc=0x00002b3d022fcd24, pid=16578, >>>> tid=0x00002b3d29716700 >>>> # >>>> # JRE version: Java(TM) SE Runtime Environment (8.0_92-b14) (build >>>> 1.8.0_92-b14) >>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode >>>> linux-amd64 compressed oops) >>>> # Problematic frame: >>>> # V [libjvm.so+0x414d24] ciEnv::get_field_by_index(ciInstanceKlass*, >>>> int)+0x94 >>>> # >>>> # Failed to write core dump. Core dumps have been disabled. To enable >>>> core dumping, try "ulimit -c unlimited" before starting Java again >>>> # >>>> # An error report file with more information is saved as: >>>> # /home/gl069/ompi/bin/executor/hs_err_pid16578.log >>>> # >>>> # Compiler replay data is saved as: >>>> # /home/gl069/ompi/bin/executor/replay_pid16578.log >>>> # >>>> # If you would like to submit a bug report, please visit: >>>> # http://bugreport.java.com/bugreport/crash.jsp >>>> # >>>> [titan01:16578] *** Process received signal *** >>>> [titan01:16578] Signal: Aborted (6) >>>> [titan01:16578] Signal code: (-6) >>>> [titan01:16578] [ 0] /usr/lib64/libpthread.so.0(+0xf100)[0x2b3d01500100] >>>> [titan01:16578] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b3d01b5c5f7] >>>> [titan01:16578] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b3d01b5dce8] >>>> [titan01:16578] [ 3] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91e605)[0x2b3d02806605] >>>> [titan01:16578] [ 4] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0xabda63)[0x2b3d029a5a63] >>>> [titan01:16578] [ 5] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x14f)[0x2b3d0280be2f] >>>> [titan01:16578] [ 6] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91a5c3)[0x2b3d028025c3] >>>> [titan01:16578] [ 7] /usr/lib64/libc.so.6(+0x35670)[0x2b3d01b5c670] >>>> [titan01:16578] [ 8] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x414d24)[0x2b3d022fcd24] >>>> [titan01:16578] [ 9] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x43c5ae)[0x2b3d023245ae] >>>> [titan01:16578] [10] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x369ade)[0x2b3d02251ade] >>>> [titan01:16578] [11] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36eda0)[0x2b3d02256da0] >>>> [titan01:16578] [12] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37091b)[0x2b3d0225891b] >>>> [titan01:16578] [13] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3712b6)[0x2b3d022592b6] >>>> [titan01:16578] [14] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36d2cf)[0x2b3d022552cf] >>>> [titan01:16578] [15] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36e412)[0x2b3d02256412] >>>> [titan01:16578] [16] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36ed8d)[0x2b3d02256d8d] >>>> [titan01:16578] [17] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37091b)[0x2b3d0225891b] >>>> [titan01:16578] [18] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3712b6)[0x2b3d022592b6] >>>> [titan01:16578] [19] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36d2cf)[0x2b3d022552cf] >>>> [titan01:16578] [20] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36e412)[0x2b3d02256412] >>>> [titan01:16578] [21] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36ed8d)[0x2b3d02256d8d] >>>> [titan01:16578] [22] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3708c2)[0x2b3d022588c2] >>>> [titan01:16578] [23] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3724e7)[0x2b3d0225a4e7] >>>> [titan01:16578] [24] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37a817)[0x2b3d02262817] >>>> [titan01:16578] [25] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37a92f)[0x2b3d0226292f] >>>> [titan01:16578] [26] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x358edb)[0x2b3d02240edb] >>>> [titan01:16578] [27] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x35929e)[0x2b3d0224129e] >>>> [titan01:16578] [28] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3593ce)[0x2b3d022413ce] >>>> [titan01:16578] [29] >>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x35973e)[0x2b3d0224173e] >>>> [titan01:16578] *** End of error message *** >>>> ------------------------------------------------------- >>>> Primary job terminated normally, but 1 process returned >>>> a non-zero exit code. Per user-direction, the job has been aborted. >>>> ------------------------------------------------------- >>>> >>>> -------------------------------------------------------------------------- >>>> mpirun noticed that process rank 2 with PID 0 on node titan01 exited on >>>> signal 6 (Aborted). >>>> >>>> -------------------------------------------------------------------------- >>>> >>>> I don't know if it is a problem of java or ompi - but the last years, >>>> java worked with no problems on my machine... >>>> >>>> Thank you for your tips in advance! >>>> Gundram >>>> >>>> On 07/06/2016 03:10 PM, Gilles Gouaillardet wrote: >>>> >>>> Note a race condition in MPI_Init has been fixed yesterday in the >>>> master. >>>> can you please update your OpenMPI and try again ? >>>> >>>> hopefully the hang will disappear. >>>> >>>> Can you reproduce the crash with a simpler (and ideally deterministic) >>>> version of your program. >>>> the crash occurs in hashcode, and this makes little sense to me. can >>>> you also update your jdk ? >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> On Wednesday, July 6, 2016, Gundram Leifert < >>>> gundram.leif...@uni-rostock.de >>>> <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');>> >>>> wrote: >>>> >>>>> Hello Jason, >>>>> >>>>> thanks for your response! I thing it is another problem. I try to send >>>>> 100MB bytes. So there are not many tries (between 10 and 30). I realized >>>>> that the execution of this code can result 3 different errors: >>>>> >>>>> 1. most often the posted error message occures. >>>>> >>>>> 2. in <10% the cases i have a live lock. I can see 3 java-processes, >>>>> one with 200% and two with 100% processor utilization. After ~15 minutes >>>>> without new system outputs this error occurs. >>>>> >>>>> >>>>> [thread 47499823949568 also had an error] >>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>> # >>>>> # Internal Error (safepoint.cpp:317), pid=24256, tid=47500347131648 >>>>> # guarantee(PageArmed == 0) failed: invariant >>>>> # >>>>> # JRE version: 7.0_25-b15 >>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode >>>>> linux-amd64 compressed oops) >>>>> # Failed to write core dump. Core dumps have been disabled. To enable >>>>> core dumping, try "ulimit -c unlimited" before starting Java again >>>>> # >>>>> # An error report file with more information is saved as: >>>>> # /home/gl069/ompi/bin/executor/hs_err_pid24256.log >>>>> # >>>>> # If you would like to submit a bug report, please visit: >>>>> # <http://bugreport.sun.com/bugreport/crash.jsp> >>>>> http://bugreport.sun.com/bugreport/crash.jsp >>>>> # >>>>> [titan01:24256] *** Process received signal *** >>>>> [titan01:24256] Signal: Aborted (6) >>>>> [titan01:24256] Signal code: (-6) >>>>> [titan01:24256] [ 0] >>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2b336a324100] >>>>> [titan01:24256] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b336a9815f7] >>>>> [titan01:24256] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b336a982ce8] >>>>> [titan01:24256] [ 3] >>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2b336b44fac5] >>>>> [titan01:24256] [ 4] >>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2b336b5af137] >>>>> [titan01:24256] [ 5] >>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x407262)[0x2b336b114262] >>>>> [titan01:24256] [ 6] >>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x7c6c34)[0x2b336b4d3c34] >>>>> [titan01:24256] [ 7] >>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a9c17)[0x2b336b5b6c17] >>>>> [titan01:24256] [ 8] >>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8aa2c0)[0x2b336b5b72c0] >>>>> [titan01:24256] [ 9] >>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x744270)[0x2b336b451270] >>>>> [titan01:24256] [10] >>>>> /usr/lib64/libpthread.so.0(+0x7dc5)[0x2b336a31cdc5] >>>>> [titan01:24256] [11] /usr/lib64/libc.so.6(clone+0x6d)[0x2b336aa4228d] >>>>> [titan01:24256] *** End of error message *** >>>>> ------------------------------------------------------- >>>>> Primary job terminated normally, but 1 process returned >>>>> a non-zero exit code. Per user-direction, the job has been aborted. >>>>> ------------------------------------------------------- >>>>> >>>>> -------------------------------------------------------------------------- >>>>> mpirun noticed that process rank 0 with PID 0 on node titan01 exited >>>>> on signal 6 (Aborted). >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> >>>>> 3. in <10% the cases i have a dead lock while MPI.init. This stays for >>>>> more than 15 minutes without returning with an error message... >>>>> >>>>> Can I enable some debug-flags to see what happens on C / OpenMPI side? >>>>> >>>>> Thanks in advance for your help! >>>>> Gundram Leifert >>>>> >>>>> >>>>> On 07/05/2016 06:05 PM, Jason Maldonis wrote: >>>>> >>>>> After reading your thread looks like it may be related to an issue I >>>>> had a few weeks ago (I'm a novice though). Maybe my thread will be of >>>>> help: >>>>> <https://www.open-mpi.org/community/lists/users/2016/06/29425.php> >>>>> https://www.open-mpi.org/community/lists/users/2016/06/29425.php >>>>> >>>>> When you say "After a specific number of repetitions the process >>>>> either hangs up or returns with a SIGSEGV." does you mean that a single >>>>> call hangs, or that at some point during the for loop a call hangs? If you >>>>> mean the latter, then it might relate to my issue. Otherwise my thread >>>>> probably won't be helpful. >>>>> >>>>> Jason Maldonis >>>>> Research Assistant of Professor Paul Voyles >>>>> Materials Science Grad Student >>>>> University of Wisconsin, Madison >>>>> 1509 University Ave, Rm M142 >>>>> Madison, WI 53706 >>>>> maldo...@wisc.edu <javascript:_e(%7B%7D,'cvml','maldo...@wisc.edu');> >>>>> 608-295-5532 >>>>> >>>>> On Tue, Jul 5, 2016 at 9:58 AM, Gundram Leifert < >>>>> gundram.leif...@uni-rostock.de >>>>> <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');>> >>>>> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I try to send many byte-arrays via broadcast. After a specific number >>>>>> of repetitions the process either hangs up or returns with a SIGSEGV. >>>>>> Does >>>>>> any one can help me solving the problem: >>>>>> >>>>>> ########## The code: >>>>>> >>>>>> import java.util.Random; >>>>>> import mpi.*; >>>>>> >>>>>> public class TestSendBigFiles { >>>>>> >>>>>> public static void log(String msg) { >>>>>> try { >>>>>> System.err.println(String.format("%2d/%2d:%s", >>>>>> MPI.COMM_WORLD.getRank(), MPI.COMM_WORLD.getSize(), msg)); >>>>>> } catch (MPIException ex) { >>>>>> System.err.println(String.format("%2s/%2s:%s", "?", "?", >>>>>> msg)); >>>>>> } >>>>>> } >>>>>> >>>>>> private static int hashcode(byte[] bytearray) { >>>>>> if (bytearray == null) { >>>>>> return 0; >>>>>> } >>>>>> int hash = 39; >>>>>> for (int i = 0; i < bytearray.length; i++) { >>>>>> byte b = bytearray[i]; >>>>>> hash = hash * 7 + (int) b; >>>>>> } >>>>>> return hash; >>>>>> } >>>>>> >>>>>> public static void main(String args[]) throws MPIException { >>>>>> log("start main"); >>>>>> MPI.Init(args); >>>>>> try { >>>>>> log("initialized done"); >>>>>> byte[] saveMem = new byte[100000000]; >>>>>> MPI.COMM_WORLD.barrier(); >>>>>> Random r = new Random(); >>>>>> r.nextBytes(saveMem); >>>>>> if (MPI.COMM_WORLD.getRank() == 0) { >>>>>> for (int i = 0; i < 1000; i++) { >>>>>> saveMem[r.nextInt(saveMem.length)]++; >>>>>> log("i = " + i); >>>>>> int[] lengthData = new int[]{saveMem.length}; >>>>>> log("object hash = " + hashcode(saveMem)); >>>>>> log("length = " + lengthData[0]); >>>>>> MPI.COMM_WORLD.bcast(lengthData, 1, MPI.INT, 0); >>>>>> log("bcast length done (length = " + >>>>>> lengthData[0] + ")"); >>>>>> MPI.COMM_WORLD.barrier(); >>>>>> MPI.COMM_WORLD.bcast(saveMem, lengthData[0], >>>>>> MPI.BYTE, 0); >>>>>> log("bcast data done"); >>>>>> MPI.COMM_WORLD.barrier(); >>>>>> } >>>>>> MPI.COMM_WORLD.bcast(new int[]{0}, 1, MPI.INT, 0); >>>>>> } else { >>>>>> while (true) { >>>>>> int[] lengthData = new int[1]; >>>>>> MPI.COMM_WORLD.bcast(lengthData, 1, MPI.INT, 0); >>>>>> log("bcast length done (length = " + >>>>>> lengthData[0] + ")"); >>>>>> if (lengthData[0] == 0) { >>>>>> break; >>>>>> } >>>>>> MPI.COMM_WORLD.barrier(); >>>>>> saveMem = new byte[lengthData[0]]; >>>>>> MPI.COMM_WORLD.bcast(saveMem, saveMem.length, >>>>>> MPI.BYTE, 0); >>>>>> log("bcast data done"); >>>>>> MPI.COMM_WORLD.barrier(); >>>>>> log("object hash = " + hashcode(saveMem)); >>>>>> } >>>>>> } >>>>>> MPI.COMM_WORLD.barrier(); >>>>>> } catch (MPIException ex) { >>>>>> System.out.println("caugth error." + ex); >>>>>> log(ex.getMessage()); >>>>>> } catch (RuntimeException ex) { >>>>>> System.out.println("caugth error." + ex); >>>>>> log(ex.getMessage()); >>>>>> } finally { >>>>>> MPI.Finalize(); >>>>>> } >>>>>> >>>>>> } >>>>>> >>>>>> } >>>>>> >>>>>> >>>>>> ############ The Error (if it does not just hang up): >>>>>> >>>>>> # >>>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>>> # >>>>>> # SIGSEGV (0xb) at pc=0x00002b7e9c86e3a1, pid=1172, >>>>>> tid=47822674495232 >>>>>> # >>>>>> # >>>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>>> # JRE version: 7.0_25-b15 >>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode >>>>>> linux-amd64 compressed oops) >>>>>> # Problematic frame: >>>>>> # # >>>>>> # SIGSEGV (0xb) at pc=0x00002af69c0693a1, pid=1173, >>>>>> tid=47238546896640 >>>>>> # >>>>>> # JRE version: 7.0_25-b15 >>>>>> J de.uros.citlab.executor.test.TestSendBigFiles.hashcode([B)I >>>>>> # >>>>>> # Failed to write core dump. Core dumps have been disabled. To enable >>>>>> core dumping, try "ulimit -c unlimited" before starting Java again >>>>>> # >>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode >>>>>> linux-amd64 compressed oops) >>>>>> # Problematic frame: >>>>>> # J de.uros.citlab.executor.test.TestSendBigFiles.hashcode([B)I >>>>>> # >>>>>> # Failed to write core dump. Core dumps have been disabled. To enable >>>>>> core dumping, try "ulimit -c unlimited" before starting Java again >>>>>> # >>>>>> # An error report file with more information is saved as: >>>>>> # /home/gl069/ompi/bin/executor/hs_err_pid1172.log >>>>>> # An error report file with more information is saved as: >>>>>> # /home/gl069/ompi/bin/executor/hs_err_pid1173.log >>>>>> # >>>>>> # If you would like to submit a bug report, please visit: >>>>>> # <http://bugreport.sun.com/bugreport/crash.jsp> >>>>>> http://bugreport.sun.com/bugreport/crash.jsp >>>>>> # >>>>>> # >>>>>> # If you would like to submit a bug report, please visit: >>>>>> # <http://bugreport.sun.com/bugreport/crash.jsp> >>>>>> http://bugreport.sun.com/bugreport/crash.jsp >>>>>> # >>>>>> [titan01:01172] *** Process received signal *** >>>>>> [titan01:01172] Signal: Aborted (6) >>>>>> [titan01:01172] Signal code: (-6) >>>>>> [titan01:01173] *** Process received signal *** >>>>>> [titan01:01173] Signal: Aborted (6) >>>>>> [titan01:01173] Signal code: (-6) >>>>>> [titan01:01172] [ 0] >>>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2b7e9596a100] >>>>>> [titan01:01172] [ 1] >>>>>> /usr/lib64/libc.so.6(gsignal+0x37)[0x2b7e95fc75f7] >>>>>> [titan01:01172] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b7e95fc8ce8] >>>>>> [titan01:01172] [ 3] >>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2b7e96a95ac5] >>>>>> [titan01:01172] [ 4] >>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2b7e96bf5137] >>>>>> [titan01:01172] [ 5] >>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x140)[0x2b7e96a995e0] >>>>>> [titan01:01172] [ 6] [titan01:01173] [ 0] >>>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2af694ded100] >>>>>> [titan01:01173] [ 1] /usr/lib64/libc.so.6(+0x35670)[0x2b7e95fc7670] >>>>>> [titan01:01172] [ 7] [0x2b7e9c86e3a1] >>>>>> [titan01:01172] *** End of error message *** >>>>>> /usr/lib64/libc.so.6(gsignal+0x37)[0x2af69544a5f7] >>>>>> [titan01:01173] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2af69544bce8] >>>>>> [titan01:01173] [ 3] >>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2af695f18ac5] >>>>>> [titan01:01173] [ 4] >>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2af696078137] >>>>>> [titan01:01173] [ 5] >>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x140)[0x2af695f1c5e0] >>>>>> [titan01:01173] [ 6] /usr/lib64/libc.so.6(+0x35670)[0x2af69544a670] >>>>>> [titan01:01173] [ 7] [0x2af69c0693a1] >>>>>> [titan01:01173] *** End of error message *** >>>>>> ------------------------------------------------------- >>>>>> Primary job terminated normally, but 1 process returned >>>>>> a non-zero exit code. Per user-direction, the job has been aborted. >>>>>> ------------------------------------------------------- >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> mpirun noticed that process rank 1 with PID 0 on node titan01 exited >>>>>> on signal 6 (Aborted). >>>>>> >>>>>> >>>>>> ########CONFIGURATION: >>>>>> I used the ompi master sources from github: >>>>>> commit 267821f0dd405b5f4370017a287d9a49f92e734a >>>>>> Author: Gilles Gouaillardet <gil...@rist.or.jp >>>>>> <javascript:_e(%7B%7D,'cvml','gil...@rist.or.jp');>> >>>>>> Date: Tue Jul 5 13:47:50 2016 +0900 >>>>>> >>>>>> ./configure --enable-mpi-java >>>>>> --with-jdk-dir=/home/gl069/bin/jdk1.7.0_25 --disable-dlopen >>>>>> --disable-mca-dso >>>>>> >>>>>> Thanks a lot for your help! >>>>>> Gundram >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> >>>>>> Subscription: <https://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>> https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> <http://www.open-mpi.org/community/lists/users/2016/07/29584.php> >>>>>> http://www.open-mpi.org/community/lists/users/2016/07/29584.php >>>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing listus...@open-mpi.org >>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2016/07/29585.php >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> users mailing listus...@open-mpi.org >>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2016/07/29587.php >>>> >>>> >>>> >>> >>> _______________________________________________ >>> users mailing listus...@open-mpi.org >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/07/29589.php >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing listus...@open-mpi.org >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/07/29590.php >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing listus...@open-mpi.org >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/07/29592.php >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing listus...@open-mpi.org >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/07/29593.php >>> >>> >>> >> >> _______________________________________________ >> users mailing listus...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/07/29601.php >> >> >> > > _______________________________________________ > users mailing listus...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/07/29603.php > > >