Hi Gundram Could you configure without the disable dlopen option and retry?
Howard Am Freitag, 8. Juli 2016 schrieb Gilles Gouaillardet : > the JVM sets its own signal handlers, and it is important openmpi dones > not override them. > this is what previously happened with PSM (infinipath) but this has been > solved since. > you might be linking with a third party library that hijacks signal > handlers and cause the crash > (which would explain why I cannot reproduce the issue) > > the master branch has a revamped memory patcher (compared to v2.x or > v1.10), and that could have some bad interactions with the JVM, so you > might also give v2.x a try > > Cheers, > > Gilles > > On Friday, July 8, 2016, Gundram Leifert <gundram.leif...@uni-rostock.de > <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');>> wrote: > >> You made the best of it... thanks a lot! >> >> Whithout MPI it runs. >> Just adding MPI.init() causes the crash! >> >> maybe I installed something wrong... >> >> install newest automake, autoconf, m4, libtoolize in right order and same >> prefix >> check out ompi, >> autogen >> configure with same prefix, pointing to the same jdk, I later use >> make >> make install >> >> I will test some different configurations of ./configure... >> >> >> On 07/08/2016 01:40 PM, Gilles Gouaillardet wrote: >> >> I am running out of ideas ... >> >> what if you do not run within slurm ? >> what if you do not use '-cp executor.jar' >> or what if you configure without --disable-dlopen --disable-mca-dso ? >> >> if you >> mpirun -np 1 ... >> then MPI_Bcast and MPI_Barrier are basically no-op, so it is really weird >> your program is still crashing. an other test is to comment out MPI_Bcast >> and MPI_Barrier and try again with -np 1 >> >> Cheers, >> >> Gilles >> >> On Friday, July 8, 2016, Gundram Leifert <gundram.leif...@uni-rostock.de> >> wrote: >> >>> In any cases the same error. >>> this is my code: >>> >>> salloc -n 3 >>> export IPATH_NO_BACKTRACE >>> ulimit -s 10240 >>> mpirun -np 3 java -cp executor.jar >>> de.uros.citlab.executor.test.TestSendBigFiles2 >>> >>> >>> also for 1 or two cores, the process crashes. >>> >>> >>> On 07/08/2016 12:32 PM, Gilles Gouaillardet wrote: >>> >>> you can try >>> export IPATH_NO_BACKTRACE >>> before invoking mpirun (that should not be needed though) >>> >>> an other test is to >>> ulimit -s 10240 >>> before invoking mpirun. >>> >>> btw, do you use mpirun or srun ? >>> >>> can you reproduce the crash with 1 or 2 tasks ? >>> >>> Cheers, >>> >>> Gilles >>> >>> On Friday, July 8, 2016, Gundram Leifert <gundram.leif...@uni-rostock.de> >>> wrote: >>> >>>> Hello, >>>> >>>> configure: >>>> ./configure --enable-mpi-java >>>> --with-jdk-dir=/home/gl069/bin/jdk1.7.0_25 --disable-dlopen >>>> --disable-mca-dso >>>> >>>> >>>> 1 node with 3 cores. I use SLURM to allocate one node. I changed --mem, >>>> but it has no effect. >>>> salloc -n 3 >>>> >>>> >>>> core file size (blocks, -c) 0 >>>> data seg size (kbytes, -d) unlimited >>>> scheduling priority (-e) 0 >>>> file size (blocks, -f) unlimited >>>> pending signals (-i) 256564 >>>> max locked memory (kbytes, -l) unlimited >>>> max memory size (kbytes, -m) unlimited >>>> open files (-n) 100000 >>>> pipe size (512 bytes, -p) 8 >>>> POSIX message queues (bytes, -q) 819200 >>>> real-time priority (-r) 0 >>>> stack size (kbytes, -s) unlimited >>>> cpu time (seconds, -t) unlimited >>>> max user processes (-u) 4096 >>>> virtual memory (kbytes, -v) unlimited >>>> file locks (-x) unlimited >>>> >>>> uname -a >>>> Linux titan01.service 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 >>>> 16:04:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >>>> >>>> cat /etc/system-release >>>> CentOS Linux release 7.2.1511 (Core) >>>> >>>> what else do you need? >>>> >>>> Cheers, Gundram >>>> >>>> On 07/07/2016 10:05 AM, Gilles Gouaillardet wrote: >>>> >>>> Gundram, >>>> >>>> >>>> can you please provide more information on your environment : >>>> >>>> - configure command line >>>> >>>> - OS >>>> >>>> - memory available >>>> >>>> - ulimit -a >>>> >>>> - number of nodes >>>> >>>> - number of tasks used >>>> >>>> - interconnect used (if any) >>>> >>>> - batch manager (if any) >>>> >>>> >>>> Cheers, >>>> >>>> >>>> Gilles >>>> On 7/7/2016 4:17 PM, Gundram Leifert wrote: >>>> >>>> Hello Gilles, >>>> >>>> I tried you code and it crashes after 3-15 iterations (see (1)). It is >>>> always the same error (only the "94" varies). >>>> >>>> Meanwhile I think Java and MPI use the same memory because when I >>>> delete the hash-call, the program runs sometimes more than 9k iterations. >>>> When it crashes, there are different lines (see (2) and (3)). The >>>> crashes also occurs on rank 0. >>>> >>>> ##### (1)##### >>>> # Problematic frame: >>>> # J 94 C2 de.uros.citlab.executor.test.TestSendBigFiles2.hashcode([BI)I >>>> (42 bytes) @ 0x00002b03242dc9c4 [0x00002b03242dc860+0x164] >>>> >>>> #####(2)##### >>>> # Problematic frame: >>>> # V [libjvm.so+0x68d0f6] >>>> JavaCallWrapper::JavaCallWrapper(methodHandle, Handle, JavaValue*, >>>> Thread*)+0xb6 >>>> >>>> #####(3)##### >>>> # Problematic frame: >>>> # V [libjvm.so+0x4183bf] >>>> ThreadInVMfromNative::ThreadInVMfromNative(JavaThread*)+0x4f >>>> >>>> Any more idea? >>>> >>>> On 07/07/2016 03:00 AM, Gilles Gouaillardet wrote: >>>> >>>> Gundram, >>>> >>>> >>>> fwiw, i cannot reproduce the issue on my box >>>> >>>> - centos 7 >>>> >>>> - java version "1.8.0_71" >>>> Java(TM) SE Runtime Environment (build 1.8.0_71-b15) >>>> Java HotSpot(TM) 64-Bit Server VM (build 25.71-b15, mixed mode) >>>> >>>> >>>> i noticed on non zero rank saveMem is allocated at each iteration. >>>> ideally, the garbage collector can take care of that and this should >>>> not be an issue. >>>> >>>> would you mind giving the attached file a try ? >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> On 7/7/2016 7:41 AM, Gilles Gouaillardet wrote: >>>> >>>> I will have a look at it today >>>> >>>> how did you configure OpenMPI ? >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> On Thursday, July 7, 2016, Gundram Leifert < >>>> gundram.leif...@uni-rostock.de> wrote: >>>> >>>>> Hello Giles, >>>>> >>>>> thank you for your hints! I did 3 changes, unfortunately the same >>>>> error occures: >>>>> >>>>> update ompi: >>>>> commit ae8444682f0a7aa158caea08800542ce9874455e >>>>> Author: Ralph Castain <r...@open-mpi.org> >>>>> Date: Tue Jul 5 20:07:16 2016 -0700 >>>>> >>>>> update java: >>>>> java version "1.8.0_92" >>>>> Java(TM) SE Runtime Environment (build 1.8.0_92-b14) >>>>> Java HotSpot(TM) Server VM (build 25.92-b14, mixed mode) >>>>> >>>>> delete hashcode-lines. >>>>> >>>>> Now I get this error message - to 100%, after different number of >>>>> iterations (15-300): >>>>> >>>>> 0/ 3:length = 100000000 >>>>> 0/ 3:bcast length done (length = 100000000) >>>>> 1/ 3:bcast length done (length = 100000000) >>>>> 2/ 3:bcast length done (length = 100000000) >>>>> # >>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>> # >>>>> # SIGSEGV (0xb) at pc=0x00002b3d022fcd24, pid=16578, >>>>> tid=0x00002b3d29716700 >>>>> # >>>>> # JRE version: Java(TM) SE Runtime Environment (8.0_92-b14) (build >>>>> 1.8.0_92-b14) >>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode >>>>> linux-amd64 compressed oops) >>>>> # Problematic frame: >>>>> # V [libjvm.so+0x414d24] ciEnv::get_field_by_index(ciInstanceKlass*, >>>>> int)+0x94 >>>>> # >>>>> # Failed to write core dump. Core dumps have been disabled. To enable >>>>> core dumping, try "ulimit -c unlimited" before starting Java again >>>>> # >>>>> # An error report file with more information is saved as: >>>>> # /home/gl069/ompi/bin/executor/hs_err_pid16578.log >>>>> # >>>>> # Compiler replay data is saved as: >>>>> # /home/gl069/ompi/bin/executor/replay_pid16578.log >>>>> # >>>>> # If you would like to submit a bug report, please visit: >>>>> # http://bugreport.java.com/bugreport/crash.jsp >>>>> # >>>>> [titan01:16578] *** Process received signal *** >>>>> [titan01:16578] Signal: Aborted (6) >>>>> [titan01:16578] Signal code: (-6) >>>>> [titan01:16578] [ 0] >>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2b3d01500100] >>>>> [titan01:16578] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b3d01b5c5f7] >>>>> [titan01:16578] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b3d01b5dce8] >>>>> [titan01:16578] [ 3] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91e605)[0x2b3d02806605] >>>>> [titan01:16578] [ 4] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0xabda63)[0x2b3d029a5a63] >>>>> [titan01:16578] [ 5] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x14f)[0x2b3d0280be2f] >>>>> [titan01:16578] [ 6] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91a5c3)[0x2b3d028025c3] >>>>> [titan01:16578] [ 7] /usr/lib64/libc.so.6(+0x35670)[0x2b3d01b5c670] >>>>> [titan01:16578] [ 8] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x414d24)[0x2b3d022fcd24] >>>>> [titan01:16578] [ 9] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x43c5ae)[0x2b3d023245ae] >>>>> [titan01:16578] [10] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x369ade)[0x2b3d02251ade] >>>>> [titan01:16578] [11] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36eda0)[0x2b3d02256da0] >>>>> [titan01:16578] [12] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37091b)[0x2b3d0225891b] >>>>> [titan01:16578] [13] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3712b6)[0x2b3d022592b6] >>>>> [titan01:16578] [14] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36d2cf)[0x2b3d022552cf] >>>>> [titan01:16578] [15] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36e412)[0x2b3d02256412] >>>>> [titan01:16578] [16] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36ed8d)[0x2b3d02256d8d] >>>>> [titan01:16578] [17] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37091b)[0x2b3d0225891b] >>>>> [titan01:16578] [18] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3712b6)[0x2b3d022592b6] >>>>> [titan01:16578] [19] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36d2cf)[0x2b3d022552cf] >>>>> [titan01:16578] [20] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36e412)[0x2b3d02256412] >>>>> [titan01:16578] [21] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36ed8d)[0x2b3d02256d8d] >>>>> [titan01:16578] [22] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3708c2)[0x2b3d022588c2] >>>>> [titan01:16578] [23] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3724e7)[0x2b3d0225a4e7] >>>>> [titan01:16578] [24] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37a817)[0x2b3d02262817] >>>>> [titan01:16578] [25] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37a92f)[0x2b3d0226292f] >>>>> [titan01:16578] [26] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x358edb)[0x2b3d02240edb] >>>>> [titan01:16578] [27] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x35929e)[0x2b3d0224129e] >>>>> [titan01:16578] [28] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3593ce)[0x2b3d022413ce] >>>>> [titan01:16578] [29] >>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x35973e)[0x2b3d0224173e] >>>>> [titan01:16578] *** End of error message *** >>>>> ------------------------------------------------------- >>>>> Primary job terminated normally, but 1 process returned >>>>> a non-zero exit code. Per user-direction, the job has been aborted. >>>>> ------------------------------------------------------- >>>>> >>>>> -------------------------------------------------------------------------- >>>>> mpirun noticed that process rank 2 with PID 0 on node titan01 exited >>>>> on signal 6 (Aborted). >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> I don't know if it is a problem of java or ompi - but the last years, >>>>> java worked with no problems on my machine... >>>>> >>>>> Thank you for your tips in advance! >>>>> Gundram >>>>> >>>>> On 07/06/2016 03:10 PM, Gilles Gouaillardet wrote: >>>>> >>>>> Note a race condition in MPI_Init has been fixed yesterday in the >>>>> master. >>>>> can you please update your OpenMPI and try again ? >>>>> >>>>> hopefully the hang will disappear. >>>>> >>>>> Can you reproduce the crash with a simpler (and ideally deterministic) >>>>> version of your program. >>>>> the crash occurs in hashcode, and this makes little sense to me. can >>>>> you also update your jdk ? >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> On Wednesday, July 6, 2016, Gundram Leifert < >>>>> gundram.leif...@uni-rostock.de> wrote: >>>>> >>>>>> Hello Jason, >>>>>> >>>>>> thanks for your response! I thing it is another problem. I try to >>>>>> send 100MB bytes. So there are not many tries (between 10 and 30). I >>>>>> realized that the execution of this code can result 3 different errors: >>>>>> >>>>>> 1. most often the posted error message occures. >>>>>> >>>>>> 2. in <10% the cases i have a live lock. I can see 3 java-processes, >>>>>> one with 200% and two with 100% processor utilization. After ~15 minutes >>>>>> without new system outputs this error occurs. >>>>>> >>>>>> >>>>>> [thread 47499823949568 also had an error] >>>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>>> # >>>>>> # Internal Error (safepoint.cpp:317), pid=24256, tid=47500347131648 >>>>>> # guarantee(PageArmed == 0) failed: invariant >>>>>> # >>>>>> # JRE version: 7.0_25-b15 >>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode >>>>>> linux-amd64 compressed oops) >>>>>> # Failed to write core dump. Core dumps have been disabled. To enable >>>>>> core dumping, try "ulimit -c unlimited" before starting Java again >>>>>> # >>>>>> # An error report file with more information is saved as: >>>>>> # /home/gl069/ompi/bin/executor/hs_err_pid24256.log >>>>>> # >>>>>> # If you would like to submit a bug report, please visit: >>>>>> # <http://bugreport.sun.com/bugreport/crash.jsp> >>>>>> http://bugreport.sun.com/bugreport/crash.jsp >>>>>> # >>>>>> [titan01:24256] *** Process received signal *** >>>>>> [titan01:24256] Signal: Aborted (6) >>>>>> [titan01:24256] Signal code: (-6) >>>>>> [titan01:24256] [ 0] >>>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2b336a324100] >>>>>> [titan01:24256] [ 1] >>>>>> /usr/lib64/libc.so.6(gsignal+0x37)[0x2b336a9815f7] >>>>>> [titan01:24256] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b336a982ce8] >>>>>> [titan01:24256] [ 3] >>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2b336b44fac5] >>>>>> [titan01:24256] [ 4] >>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2b336b5af137] >>>>>> [titan01:24256] [ 5] >>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x407262)[0x2b336b114262] >>>>>> [titan01:24256] [ 6] >>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x7c6c34)[0x2b336b4d3c34] >>>>>> [titan01:24256] [ 7] >>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a9c17)[0x2b336b5b6c17] >>>>>> [titan01:24256] [ 8] >>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8aa2c0)[0x2b336b5b72c0] >>>>>> [titan01:24256] [ 9] >>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x744270)[0x2b336b451270] >>>>>> [titan01:24256] [10] >>>>>> /usr/lib64/libpthread.so.0(+0x7dc5)[0x2b336a31cdc5] >>>>>> [titan01:24256] [11] /usr/lib64/libc.so.6(clone+0x6d)[0x2b336aa4228d] >>>>>> [titan01:24256] *** End of error message *** >>>>>> ------------------------------------------------------- >>>>>> Primary job terminated normally, but 1 process returned >>>>>> a non-zero exit code. Per user-direction, the job has been aborted. >>>>>> ------------------------------------------------------- >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> mpirun noticed that process rank 0 with PID 0 on node titan01 exited >>>>>> on signal 6 (Aborted). >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> >>>>>> 3. in <10% the cases i have a dead lock while MPI.init. This stays >>>>>> for more than 15 minutes without returning with an error message... >>>>>> >>>>>> Can I enable some debug-flags to see what happens on C / OpenMPI side? >>>>>> >>>>>> Thanks in advance for your help! >>>>>> Gundram Leifert >>>>>> >>>>>> >>>>>> On 07/05/2016 06:05 PM, Jason Maldonis wrote: >>>>>> >>>>>> After reading your thread looks like it may be related to an issue I >>>>>> had a few weeks ago (I'm a novice though). Maybe my thread will be of >>>>>> help: >>>>>> <https://www.open-mpi.org/community/lists/users/2016/06/29425.php> >>>>>> https://www.open-mpi.org/community/lists/users/2016/06/29425.php >>>>>> >>>>>> When you say "After a specific number of repetitions the process >>>>>> either hangs up or returns with a SIGSEGV." does you mean that a single >>>>>> call hangs, or that at some point during the for loop a call hangs? If >>>>>> you >>>>>> mean the latter, then it might relate to my issue. Otherwise my thread >>>>>> probably won't be helpful. >>>>>> >>>>>> Jason Maldonis >>>>>> Research Assistant of Professor Paul Voyles >>>>>> Materials Science Grad Student >>>>>> University of Wisconsin, Madison >>>>>> 1509 University Ave, Rm M142 >>>>>> Madison, WI 53706 >>>>>> maldo...@wisc.edu >>>>>> 608-295-5532 >>>>>> >>>>>> On Tue, Jul 5, 2016 at 9:58 AM, Gundram Leifert < >>>>>> gundram.leif...@uni-rostock.de> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I try to send many byte-arrays via broadcast. After a specific >>>>>>> number of repetitions the process either hangs up or returns with a >>>>>>> SIGSEGV. Does any one can help me solving the problem: >>>>>>> >>>>>>> ########## The code: >>>>>>> >>>>>>> import java.util.Random; >>>>>>> import mpi.*; >>>>>>> >>>>>>> public class TestSendBigFiles { >>>>>>> >>>>>>> public static void log(String msg) { >>>>>>> try { >>>>>>> System.err.println(String.format("%2d/%2d:%s", >>>>>>> MPI.COMM_WORLD.getRank(), MPI.COMM_WORLD.getSize(), msg)); >>>>>>> } catch (MPIException ex) { >>>>>>> System.err.println(String.format("%2s/%2s:%s", "?", "?", >>>>>>> msg)); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> private static int hashcode(byte[] bytearray) { >>>>>>> if (bytearray == null) { >>>>>>> return 0; >>>>>>> } >>>>>>> int hash = 39; >>>>>>> for (int i = 0; i < bytearray.length; i++) { >>>>>>> byte b = bytearray[i]; >>>>>>> hash = hash * 7 + (int) b; >>>>>>> } >>>>>>> return hash; >>>>>>> } >>>>>>> >>>>>>> public static void main(String args[]) throws MPIException { >>>>>>> log("start main"); >>>>>>> MPI.Init(args); >>>>>>> try { >>>>>>> log("initialized done"); >>>>>>> byte[] saveMem = new byte[100000000]; >>>>>>> MPI.COMM_WORLD.barrier(); >>>>>>> Random r = new Random(); >>>>>>> r.nextBytes(saveMem); >>>>>>> if (MPI.COMM_WORLD.getRank() == 0) { >>>>>>> for (int i = 0; i < 1000; i++) { >>>>>>> saveMem[r.nextInt(saveMem.length)]++; >>>>>>> log("i = " + i); >>>>>>> int[] lengthData = new int[]{saveMem.length}; >>>>>>> log("object hash = " + hashcode(saveMem)); >>>>>>> log("length = " + lengthData[0]); >>>>>>> MPI.COMM_WORLD.bcast(lengthData, 1, MPI.INT, 0); >>>>>>> log("bcast length done (length = " + >>>>>>> lengthData[0] + ")"); >>>>>>> MPI.COMM_WORLD.barrier(); >>>>>>> MPI.COMM_WORLD.bcast(saveMem, lengthData[0], >>>>>>> MPI.BYTE, 0); >>>>>>> log("bcast data done"); >>>>>>> MPI.COMM_WORLD.barrier(); >>>>>>> } >>>>>>> MPI.COMM_WORLD.bcast(new int[]{0}, 1, MPI.INT, 0); >>>>>>> } else { >>>>>>> while (true) { >>>>>>> int[] lengthData = new int[1]; >>>>>>> MPI.COMM_WORLD.bcast(lengthData, 1, MPI.INT, 0); >>>>>>> log("bcast length done (length = " + >>>>>>> lengthData[0] + ")"); >>>>>>> if (lengthData[0] == 0) { >>>>>>> break; >>>>>>> } >>>>>>> MPI.COMM_WORLD.barrier(); >>>>>>> saveMem = new byte[lengthData[0]]; >>>>>>> MPI.COMM_WORLD.bcast(saveMem, saveMem.length, >>>>>>> MPI.BYTE, 0); >>>>>>> log("bcast data done"); >>>>>>> MPI.COMM_WORLD.barrier(); >>>>>>> log("object hash = " + hashcode(saveMem)); >>>>>>> } >>>>>>> } >>>>>>> MPI.COMM_WORLD.barrier(); >>>>>>> } catch (MPIException ex) { >>>>>>> System.out.println("caugth error." + ex); >>>>>>> log(ex.getMessage()); >>>>>>> } catch (RuntimeException ex) { >>>>>>> System.out.println("caugth error." + ex); >>>>>>> log(ex.getMessage()); >>>>>>> } finally { >>>>>>> MPI.Finalize(); >>>>>>> } >>>>>>> >>>>>>> } >>>>>>> >>>>>>> } >>>>>>> >>>>>>> >>>>>>> ############ The Error (if it does not just hang up): >>>>>>> >>>>>>> # >>>>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>>>> # >>>>>>> # SIGSEGV (0xb) at pc=0x00002b7e9c86e3a1, pid=1172, >>>>>>> tid=47822674495232 >>>>>>> # >>>>>>> # >>>>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>>>> # JRE version: 7.0_25-b15 >>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode >>>>>>> linux-amd64 compressed oops) >>>>>>> # Problematic frame: >>>>>>> # # >>>>>>> # SIGSEGV (0xb) at pc=0x00002af69c0693a1, pid=1173, >>>>>>> tid=47238546896640 >>>>>>> # >>>>>>> # JRE version: 7.0_25-b15 >>>>>>> J de.uros.citlab.executor.test.TestSendBigFiles.hashcode([B)I >>>>>>> # >>>>>>> # Failed to write core dump. Core dumps have been disabled. To >>>>>>> enable core dumping, try "ulimit -c unlimited" before starting Java >>>>>>> again >>>>>>> # >>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode >>>>>>> linux-amd64 compressed oops) >>>>>>> # Problematic frame: >>>>>>> # J de.uros.citlab.executor.test.TestSendBigFiles.hashcode([B)I >>>>>>> # >>>>>>> # Failed to write core dump. Core dumps have been disabled. To >>>>>>> enable core dumping, try "ulimit -c unlimited" before starting Java >>>>>>> again >>>>>>> # >>>>>>> # An error report file with more information is saved as: >>>>>>> # /home/gl069/ompi/bin/executor/hs_err_pid1172.log >>>>>>> # An error report file with more information is saved as: >>>>>>> # /home/gl069/ompi/bin/executor/hs_err_pid1173.log >>>>>>> # >>>>>>> # If you would like to submit a bug report, please visit: >>>>>>> # <http://bugreport.sun.com/bugreport/crash.jsp> >>>>>>> http://bugreport.sun.com/bugreport/crash.jsp >>>>>>> # >>>>>>> # >>>>>>> # If you would like to submit a bug report, please visit: >>>>>>> # <http://bugreport.sun.com/bugreport/crash.jsp> >>>>>>> http://bugreport.sun.com/bugreport/crash.jsp >>>>>>> # >>>>>>> [titan01:01172] *** Process received signal *** >>>>>>> [titan01:01172] Signal: Aborted (6) >>>>>>> [titan01:01172] Signal code: (-6) >>>>>>> [titan01:01173] *** Process received signal *** >>>>>>> [titan01:01173] Signal: Aborted (6) >>>>>>> [titan01:01173] Signal code: (-6) >>>>>>> [titan01:01172] [ 0] >>>>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2b7e9596a100] >>>>>>> [titan01:01172] [ 1] >>>>>>> /usr/lib64/libc.so.6(gsignal+0x37)[0x2b7e95fc75f7] >>>>>>> [titan01:01172] [ 2] >>>>>>> /usr/lib64/libc.so.6(abort+0x148)[0x2b7e95fc8ce8] >>>>>>> [titan01:01172] [ 3] >>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2b7e96a95ac5] >>>>>>> [titan01:01172] [ 4] >>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2b7e96bf5137] >>>>>>> [titan01:01172] [ 5] >>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x140)[0x2b7e96a995e0] >>>>>>> [titan01:01172] [ 6] [titan01:01173] [ 0] >>>>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2af694ded100] >>>>>>> [titan01:01173] [ 1] /usr/lib64/libc.so.6(+0x35670)[0x2b7e95fc7670] >>>>>>> [titan01:01172] [ 7] [0x2b7e9c86e3a1] >>>>>>> [titan01:01172] *** End of error message *** >>>>>>> /usr/lib64/libc.so.6(gsignal+0x37)[0x2af69544a5f7] >>>>>>> [titan01:01173] [ 2] >>>>>>> /usr/lib64/libc.so.6(abort+0x148)[0x2af69544bce8] >>>>>>> [titan01:01173] [ 3] >>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2af695f18ac5] >>>>>>> [titan01:01173] [ 4] >>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2af696078137] >>>>>>> [titan01:01173] [ 5] >>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x140)[0x2af695f1c5e0] >>>>>>> [titan01:01173] [ 6] /usr/lib64/libc.so.6(+0x35670)[0x2af69544a670] >>>>>>> [titan01:01173] [ 7] [0x2af69c0693a1] >>>>>>> [titan01:01173] *** End of error message *** >>>>>>> ------------------------------------------------------- >>>>>>> Primary job terminated normally, but 1 process returned >>>>>>> a non-zero exit code. Per user-direction, the job has been aborted. >>>>>>> ------------------------------------------------------- >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> mpirun noticed that process rank 1 with PID 0 on node titan01 exited >>>>>>> on signal 6 (Aborted). >>>>>>> >>>>>>> >>>>>>> ########CONFIGURATION: >>>>>>> I used the ompi master sources from github: >>>>>>> commit 267821f0dd405b5f4370017a287d9a49f92e734a >>>>>>> Author: Gilles Gouaillardet <gil...@rist.or.jp> >>>>>>> Date: Tue Jul 5 13:47:50 2016 +0900 >>>>>>> >>>>>>> ./configure --enable-mpi-java >>>>>>> --with-jdk-dir=/home/gl069/bin/jdk1.7.0_25 --disable-dlopen >>>>>>> --disable-mca-dso >>>>>>> >>>>>>> Thanks a lot for your help! >>>>>>> Gundram >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: <https://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>> https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> <http://www.open-mpi.org/community/lists/users/2016/07/29584.php> >>>>>>> http://www.open-mpi.org/community/lists/users/2016/07/29584.php >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing listus...@open-mpi.org >>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2016/07/29585.php >>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing listus...@open-mpi.org >>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2016/07/29587.php >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> users mailing listus...@open-mpi.org >>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2016/07/29589.php >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing listus...@open-mpi.org >>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2016/07/29590.php >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing listus...@open-mpi.org >>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2016/07/29592.php >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing listus...@open-mpi.org >>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2016/07/29593.php >>>> >>>> >>>> >>> >>> _______________________________________________ >>> users mailing listus...@open-mpi.org >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/07/29601.php >>> >>> >>> >> >> _______________________________________________ >> users mailing listus...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/07/29603.php >> >> >>