I am running out of ideas ... what if you do not run within slurm ? what if you do not use '-cp executor.jar' or what if you configure without --disable-dlopen --disable-mca-dso ?
if you mpirun -np 1 ... then MPI_Bcast and MPI_Barrier are basically no-op, so it is really weird your program is still crashing. an other test is to comment out MPI_Bcast and MPI_Barrier and try again with -np 1 Cheers, Gilles On Friday, July 8, 2016, Gundram Leifert <gundram.leif...@uni-rostock.de> wrote: > In any cases the same error. > this is my code: > > salloc -n 3 > export IPATH_NO_BACKTRACE > ulimit -s 10240 > mpirun -np 3 java -cp executor.jar > de.uros.citlab.executor.test.TestSendBigFiles2 > > > also for 1 or two cores, the process crashes. > > > On 07/08/2016 12:32 PM, Gilles Gouaillardet wrote: > > you can try > export IPATH_NO_BACKTRACE > before invoking mpirun (that should not be needed though) > > an other test is to > ulimit -s 10240 > before invoking mpirun. > > btw, do you use mpirun or srun ? > > can you reproduce the crash with 1 or 2 tasks ? > > Cheers, > > Gilles > > On Friday, July 8, 2016, Gundram Leifert < > <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');> > gundram.leif...@uni-rostock.de > <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');>> wrote: > >> Hello, >> >> configure: >> ./configure --enable-mpi-java --with-jdk-dir=/home/gl069/bin/jdk1.7.0_25 >> --disable-dlopen --disable-mca-dso >> >> >> 1 node with 3 cores. I use SLURM to allocate one node. I changed --mem, >> but it has no effect. >> salloc -n 3 >> >> >> core file size (blocks, -c) 0 >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 0 >> file size (blocks, -f) unlimited >> pending signals (-i) 256564 >> max locked memory (kbytes, -l) unlimited >> max memory size (kbytes, -m) unlimited >> open files (-n) 100000 >> pipe size (512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) unlimited >> cpu time (seconds, -t) unlimited >> max user processes (-u) 4096 >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> >> uname -a >> Linux titan01.service 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 >> 16:04:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >> >> cat /etc/system-release >> CentOS Linux release 7.2.1511 (Core) >> >> what else do you need? >> >> Cheers, Gundram >> >> On 07/07/2016 10:05 AM, Gilles Gouaillardet wrote: >> >> Gundram, >> >> >> can you please provide more information on your environment : >> >> - configure command line >> >> - OS >> >> - memory available >> >> - ulimit -a >> >> - number of nodes >> >> - number of tasks used >> >> - interconnect used (if any) >> >> - batch manager (if any) >> >> >> Cheers, >> >> >> Gilles >> On 7/7/2016 4:17 PM, Gundram Leifert wrote: >> >> Hello Gilles, >> >> I tried you code and it crashes after 3-15 iterations (see (1)). It is >> always the same error (only the "94" varies). >> >> Meanwhile I think Java and MPI use the same memory because when I delete >> the hash-call, the program runs sometimes more than 9k iterations. >> When it crashes, there are different lines (see (2) and (3)). The crashes >> also occurs on rank 0. >> >> ##### (1)##### >> # Problematic frame: >> # J 94 C2 de.uros.citlab.executor.test.TestSendBigFiles2.hashcode([BI)I >> (42 bytes) @ 0x00002b03242dc9c4 [0x00002b03242dc860+0x164] >> >> #####(2)##### >> # Problematic frame: >> # V [libjvm.so+0x68d0f6] JavaCallWrapper::JavaCallWrapper(methodHandle, >> Handle, JavaValue*, Thread*)+0xb6 >> >> #####(3)##### >> # Problematic frame: >> # V [libjvm.so+0x4183bf] >> ThreadInVMfromNative::ThreadInVMfromNative(JavaThread*)+0x4f >> >> Any more idea? >> >> On 07/07/2016 03:00 AM, Gilles Gouaillardet wrote: >> >> Gundram, >> >> >> fwiw, i cannot reproduce the issue on my box >> >> - centos 7 >> >> - java version "1.8.0_71" >> Java(TM) SE Runtime Environment (build 1.8.0_71-b15) >> Java HotSpot(TM) 64-Bit Server VM (build 25.71-b15, mixed mode) >> >> >> i noticed on non zero rank saveMem is allocated at each iteration. >> ideally, the garbage collector can take care of that and this should not >> be an issue. >> >> would you mind giving the attached file a try ? >> >> Cheers, >> >> Gilles >> >> On 7/7/2016 7:41 AM, Gilles Gouaillardet wrote: >> >> I will have a look at it today >> >> how did you configure OpenMPI ? >> >> Cheers, >> >> Gilles >> >> On Thursday, July 7, 2016, Gundram Leifert < >> gundram.leif...@uni-rostock.de >> <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');>> wrote: >> >>> Hello Giles, >>> >>> thank you for your hints! I did 3 changes, unfortunately the same error >>> occures: >>> >>> update ompi: >>> commit ae8444682f0a7aa158caea08800542ce9874455e >>> Author: Ralph Castain <r...@open-mpi.org> >>> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');> >>> Date: Tue Jul 5 20:07:16 2016 -0700 >>> >>> update java: >>> java version "1.8.0_92" >>> Java(TM) SE Runtime Environment (build 1.8.0_92-b14) >>> Java HotSpot(TM) Server VM (build 25.92-b14, mixed mode) >>> >>> delete hashcode-lines. >>> >>> Now I get this error message - to 100%, after different number of >>> iterations (15-300): >>> >>> 0/ 3:length = 100000000 >>> 0/ 3:bcast length done (length = 100000000) >>> 1/ 3:bcast length done (length = 100000000) >>> 2/ 3:bcast length done (length = 100000000) >>> # >>> # A fatal error has been detected by the Java Runtime Environment: >>> # >>> # SIGSEGV (0xb) at pc=0x00002b3d022fcd24, pid=16578, >>> tid=0x00002b3d29716700 >>> # >>> # JRE version: Java(TM) SE Runtime Environment (8.0_92-b14) (build >>> 1.8.0_92-b14) >>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode >>> linux-amd64 compressed oops) >>> # Problematic frame: >>> # V [libjvm.so+0x414d24] ciEnv::get_field_by_index(ciInstanceKlass*, >>> int)+0x94 >>> # >>> # Failed to write core dump. Core dumps have been disabled. To enable >>> core dumping, try "ulimit -c unlimited" before starting Java again >>> # >>> # An error report file with more information is saved as: >>> # /home/gl069/ompi/bin/executor/hs_err_pid16578.log >>> # >>> # Compiler replay data is saved as: >>> # /home/gl069/ompi/bin/executor/replay_pid16578.log >>> # >>> # If you would like to submit a bug report, please visit: >>> # http://bugreport.java.com/bugreport/crash.jsp >>> # >>> [titan01:16578] *** Process received signal *** >>> [titan01:16578] Signal: Aborted (6) >>> [titan01:16578] Signal code: (-6) >>> [titan01:16578] [ 0] /usr/lib64/libpthread.so.0(+0xf100)[0x2b3d01500100] >>> [titan01:16578] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b3d01b5c5f7] >>> [titan01:16578] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b3d01b5dce8] >>> [titan01:16578] [ 3] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91e605)[0x2b3d02806605] >>> [titan01:16578] [ 4] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0xabda63)[0x2b3d029a5a63] >>> [titan01:16578] [ 5] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x14f)[0x2b3d0280be2f] >>> [titan01:16578] [ 6] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91a5c3)[0x2b3d028025c3] >>> [titan01:16578] [ 7] /usr/lib64/libc.so.6(+0x35670)[0x2b3d01b5c670] >>> [titan01:16578] [ 8] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x414d24)[0x2b3d022fcd24] >>> [titan01:16578] [ 9] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x43c5ae)[0x2b3d023245ae] >>> [titan01:16578] [10] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x369ade)[0x2b3d02251ade] >>> [titan01:16578] [11] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36eda0)[0x2b3d02256da0] >>> [titan01:16578] [12] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37091b)[0x2b3d0225891b] >>> [titan01:16578] [13] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3712b6)[0x2b3d022592b6] >>> [titan01:16578] [14] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36d2cf)[0x2b3d022552cf] >>> [titan01:16578] [15] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36e412)[0x2b3d02256412] >>> [titan01:16578] [16] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36ed8d)[0x2b3d02256d8d] >>> [titan01:16578] [17] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37091b)[0x2b3d0225891b] >>> [titan01:16578] [18] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3712b6)[0x2b3d022592b6] >>> [titan01:16578] [19] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36d2cf)[0x2b3d022552cf] >>> [titan01:16578] [20] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36e412)[0x2b3d02256412] >>> [titan01:16578] [21] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36ed8d)[0x2b3d02256d8d] >>> [titan01:16578] [22] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3708c2)[0x2b3d022588c2] >>> [titan01:16578] [23] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3724e7)[0x2b3d0225a4e7] >>> [titan01:16578] [24] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37a817)[0x2b3d02262817] >>> [titan01:16578] [25] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37a92f)[0x2b3d0226292f] >>> [titan01:16578] [26] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x358edb)[0x2b3d02240edb] >>> [titan01:16578] [27] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x35929e)[0x2b3d0224129e] >>> [titan01:16578] [28] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3593ce)[0x2b3d022413ce] >>> [titan01:16578] [29] >>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x35973e)[0x2b3d0224173e] >>> [titan01:16578] *** End of error message *** >>> ------------------------------------------------------- >>> Primary job terminated normally, but 1 process returned >>> a non-zero exit code. Per user-direction, the job has been aborted. >>> ------------------------------------------------------- >>> >>> -------------------------------------------------------------------------- >>> mpirun noticed that process rank 2 with PID 0 on node titan01 exited on >>> signal 6 (Aborted). >>> >>> -------------------------------------------------------------------------- >>> >>> I don't know if it is a problem of java or ompi - but the last years, >>> java worked with no problems on my machine... >>> >>> Thank you for your tips in advance! >>> Gundram >>> >>> On 07/06/2016 03:10 PM, Gilles Gouaillardet wrote: >>> >>> Note a race condition in MPI_Init has been fixed yesterday in the >>> master. >>> can you please update your OpenMPI and try again ? >>> >>> hopefully the hang will disappear. >>> >>> Can you reproduce the crash with a simpler (and ideally deterministic) >>> version of your program. >>> the crash occurs in hashcode, and this makes little sense to me. can you >>> also update your jdk ? >>> >>> Cheers, >>> >>> Gilles >>> >>> On Wednesday, July 6, 2016, Gundram Leifert < >>> gundram.leif...@uni-rostock.de> wrote: >>> >>>> Hello Jason, >>>> >>>> thanks for your response! I thing it is another problem. I try to send >>>> 100MB bytes. So there are not many tries (between 10 and 30). I realized >>>> that the execution of this code can result 3 different errors: >>>> >>>> 1. most often the posted error message occures. >>>> >>>> 2. in <10% the cases i have a live lock. I can see 3 java-processes, >>>> one with 200% and two with 100% processor utilization. After ~15 minutes >>>> without new system outputs this error occurs. >>>> >>>> >>>> [thread 47499823949568 also had an error] >>>> # A fatal error has been detected by the Java Runtime Environment: >>>> # >>>> # Internal Error (safepoint.cpp:317), pid=24256, tid=47500347131648 >>>> # guarantee(PageArmed == 0) failed: invariant >>>> # >>>> # JRE version: 7.0_25-b15 >>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode >>>> linux-amd64 compressed oops) >>>> # Failed to write core dump. Core dumps have been disabled. To enable >>>> core dumping, try "ulimit -c unlimited" before starting Java again >>>> # >>>> # An error report file with more information is saved as: >>>> # /home/gl069/ompi/bin/executor/hs_err_pid24256.log >>>> # >>>> # If you would like to submit a bug report, please visit: >>>> # http://bugreport.sun.com/bugreport/crash.jsp >>>> # >>>> [titan01:24256] *** Process received signal *** >>>> [titan01:24256] Signal: Aborted (6) >>>> [titan01:24256] Signal code: (-6) >>>> [titan01:24256] [ 0] /usr/lib64/libpthread.so.0(+0xf100)[0x2b336a324100] >>>> [titan01:24256] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b336a9815f7] >>>> [titan01:24256] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b336a982ce8] >>>> [titan01:24256] [ 3] >>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2b336b44fac5] >>>> [titan01:24256] [ 4] >>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2b336b5af137] >>>> [titan01:24256] [ 5] >>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x407262)[0x2b336b114262] >>>> [titan01:24256] [ 6] >>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x7c6c34)[0x2b336b4d3c34] >>>> [titan01:24256] [ 7] >>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a9c17)[0x2b336b5b6c17] >>>> [titan01:24256] [ 8] >>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8aa2c0)[0x2b336b5b72c0] >>>> [titan01:24256] [ 9] >>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x744270)[0x2b336b451270] >>>> [titan01:24256] [10] /usr/lib64/libpthread.so.0(+0x7dc5)[0x2b336a31cdc5] >>>> [titan01:24256] [11] /usr/lib64/libc.so.6(clone+0x6d)[0x2b336aa4228d] >>>> [titan01:24256] *** End of error message *** >>>> ------------------------------------------------------- >>>> Primary job terminated normally, but 1 process returned >>>> a non-zero exit code. Per user-direction, the job has been aborted. >>>> ------------------------------------------------------- >>>> >>>> -------------------------------------------------------------------------- >>>> mpirun noticed that process rank 0 with PID 0 on node titan01 exited on >>>> signal 6 (Aborted). >>>> >>>> -------------------------------------------------------------------------- >>>> >>>> >>>> 3. in <10% the cases i have a dead lock while MPI.init. This stays for >>>> more than 15 minutes without returning with an error message... >>>> >>>> Can I enable some debug-flags to see what happens on C / OpenMPI side? >>>> >>>> Thanks in advance for your help! >>>> Gundram Leifert >>>> >>>> >>>> On 07/05/2016 06:05 PM, Jason Maldonis wrote: >>>> >>>> After reading your thread looks like it may be related to an issue I >>>> had a few weeks ago (I'm a novice though). Maybe my thread will be of help: >>>> <https://www.open-mpi.org/community/lists/users/2016/06/29425.php> >>>> https://www.open-mpi.org/community/lists/users/2016/06/29425.php >>>> >>>> When you say "After a specific number of repetitions the process >>>> either hangs up or returns with a SIGSEGV." does you mean that a single >>>> call hangs, or that at some point during the for loop a call hangs? If you >>>> mean the latter, then it might relate to my issue. Otherwise my thread >>>> probably won't be helpful. >>>> >>>> Jason Maldonis >>>> Research Assistant of Professor Paul Voyles >>>> Materials Science Grad Student >>>> University of Wisconsin, Madison >>>> 1509 University Ave, Rm M142 >>>> Madison, WI 53706 >>>> maldo...@wisc.edu <javascript:_e(%7B%7D,'cvml','maldo...@wisc.edu');> >>>> 608-295-5532 >>>> >>>> On Tue, Jul 5, 2016 at 9:58 AM, Gundram Leifert < >>>> gundram.leif...@uni-rostock.de >>>> <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');>> >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> I try to send many byte-arrays via broadcast. After a specific number >>>>> of repetitions the process either hangs up or returns with a SIGSEGV. Does >>>>> any one can help me solving the problem: >>>>> >>>>> ########## The code: >>>>> >>>>> import java.util.Random; >>>>> import mpi.*; >>>>> >>>>> public class TestSendBigFiles { >>>>> >>>>> public static void log(String msg) { >>>>> try { >>>>> System.err.println(String.format("%2d/%2d:%s", >>>>> MPI.COMM_WORLD.getRank(), MPI.COMM_WORLD.getSize(), msg)); >>>>> } catch (MPIException ex) { >>>>> System.err.println(String.format("%2s/%2s:%s", "?", "?", >>>>> msg)); >>>>> } >>>>> } >>>>> >>>>> private static int hashcode(byte[] bytearray) { >>>>> if (bytearray == null) { >>>>> return 0; >>>>> } >>>>> int hash = 39; >>>>> for (int i = 0; i < bytearray.length; i++) { >>>>> byte b = bytearray[i]; >>>>> hash = hash * 7 + (int) b; >>>>> } >>>>> return hash; >>>>> } >>>>> >>>>> public static void main(String args[]) throws MPIException { >>>>> log("start main"); >>>>> MPI.Init(args); >>>>> try { >>>>> log("initialized done"); >>>>> byte[] saveMem = new byte[100000000]; >>>>> MPI.COMM_WORLD.barrier(); >>>>> Random r = new Random(); >>>>> r.nextBytes(saveMem); >>>>> if (MPI.COMM_WORLD.getRank() == 0) { >>>>> for (int i = 0; i < 1000; i++) { >>>>> saveMem[r.nextInt(saveMem.length)]++; >>>>> log("i = " + i); >>>>> int[] lengthData = new int[]{saveMem.length}; >>>>> log("object hash = " + hashcode(saveMem)); >>>>> log("length = " + lengthData[0]); >>>>> MPI.COMM_WORLD.bcast(lengthData, 1, MPI.INT, 0); >>>>> log("bcast length done (length = " + lengthData[0] >>>>> + ")"); >>>>> MPI.COMM_WORLD.barrier(); >>>>> MPI.COMM_WORLD.bcast(saveMem, lengthData[0], >>>>> MPI.BYTE, 0); >>>>> log("bcast data done"); >>>>> MPI.COMM_WORLD.barrier(); >>>>> } >>>>> MPI.COMM_WORLD.bcast(new int[]{0}, 1, MPI.INT, 0); >>>>> } else { >>>>> while (true) { >>>>> int[] lengthData = new int[1]; >>>>> MPI.COMM_WORLD.bcast(lengthData, 1, MPI.INT, 0); >>>>> log("bcast length done (length = " + lengthData[0] >>>>> + ")"); >>>>> if (lengthData[0] == 0) { >>>>> break; >>>>> } >>>>> MPI.COMM_WORLD.barrier(); >>>>> saveMem = new byte[lengthData[0]]; >>>>> MPI.COMM_WORLD.bcast(saveMem, saveMem.length, >>>>> MPI.BYTE, 0); >>>>> log("bcast data done"); >>>>> MPI.COMM_WORLD.barrier(); >>>>> log("object hash = " + hashcode(saveMem)); >>>>> } >>>>> } >>>>> MPI.COMM_WORLD.barrier(); >>>>> } catch (MPIException ex) { >>>>> System.out.println("caugth error." + ex); >>>>> log(ex.getMessage()); >>>>> } catch (RuntimeException ex) { >>>>> System.out.println("caugth error." + ex); >>>>> log(ex.getMessage()); >>>>> } finally { >>>>> MPI.Finalize(); >>>>> } >>>>> >>>>> } >>>>> >>>>> } >>>>> >>>>> >>>>> ############ The Error (if it does not just hang up): >>>>> >>>>> # >>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>> # >>>>> # SIGSEGV (0xb) at pc=0x00002b7e9c86e3a1, pid=1172, tid=47822674495232 >>>>> # >>>>> # >>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>> # JRE version: 7.0_25-b15 >>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode >>>>> linux-amd64 compressed oops) >>>>> # Problematic frame: >>>>> # # >>>>> # SIGSEGV (0xb) at pc=0x00002af69c0693a1, pid=1173, tid=47238546896640 >>>>> # >>>>> # JRE version: 7.0_25-b15 >>>>> J de.uros.citlab.executor.test.TestSendBigFiles.hashcode([B)I >>>>> # >>>>> # Failed to write core dump. Core dumps have been disabled. To enable >>>>> core dumping, try "ulimit -c unlimited" before starting Java again >>>>> # >>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode >>>>> linux-amd64 compressed oops) >>>>> # Problematic frame: >>>>> # J de.uros.citlab.executor.test.TestSendBigFiles.hashcode([B)I >>>>> # >>>>> # Failed to write core dump. Core dumps have been disabled. To enable >>>>> core dumping, try "ulimit -c unlimited" before starting Java again >>>>> # >>>>> # An error report file with more information is saved as: >>>>> # /home/gl069/ompi/bin/executor/hs_err_pid1172.log >>>>> # An error report file with more information is saved as: >>>>> # /home/gl069/ompi/bin/executor/hs_err_pid1173.log >>>>> # >>>>> # If you would like to submit a bug report, please visit: >>>>> # <http://bugreport.sun.com/bugreport/crash.jsp> >>>>> http://bugreport.sun.com/bugreport/crash.jsp >>>>> # >>>>> # >>>>> # If you would like to submit a bug report, please visit: >>>>> # <http://bugreport.sun.com/bugreport/crash.jsp> >>>>> http://bugreport.sun.com/bugreport/crash.jsp >>>>> # >>>>> [titan01:01172] *** Process received signal *** >>>>> [titan01:01172] Signal: Aborted (6) >>>>> [titan01:01172] Signal code: (-6) >>>>> [titan01:01173] *** Process received signal *** >>>>> [titan01:01173] Signal: Aborted (6) >>>>> [titan01:01173] Signal code: (-6) >>>>> [titan01:01172] [ 0] >>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2b7e9596a100] >>>>> [titan01:01172] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b7e95fc75f7] >>>>> [titan01:01172] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b7e95fc8ce8] >>>>> [titan01:01172] [ 3] >>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2b7e96a95ac5] >>>>> [titan01:01172] [ 4] >>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2b7e96bf5137] >>>>> [titan01:01172] [ 5] >>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x140)[0x2b7e96a995e0] >>>>> [titan01:01172] [ 6] [titan01:01173] [ 0] >>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2af694ded100] >>>>> [titan01:01173] [ 1] /usr/lib64/libc.so.6(+0x35670)[0x2b7e95fc7670] >>>>> [titan01:01172] [ 7] [0x2b7e9c86e3a1] >>>>> [titan01:01172] *** End of error message *** >>>>> /usr/lib64/libc.so.6(gsignal+0x37)[0x2af69544a5f7] >>>>> [titan01:01173] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2af69544bce8] >>>>> [titan01:01173] [ 3] >>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2af695f18ac5] >>>>> [titan01:01173] [ 4] >>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2af696078137] >>>>> [titan01:01173] [ 5] >>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x140)[0x2af695f1c5e0] >>>>> [titan01:01173] [ 6] /usr/lib64/libc.so.6(+0x35670)[0x2af69544a670] >>>>> [titan01:01173] [ 7] [0x2af69c0693a1] >>>>> [titan01:01173] *** End of error message *** >>>>> ------------------------------------------------------- >>>>> Primary job terminated normally, but 1 process returned >>>>> a non-zero exit code. Per user-direction, the job has been aborted. >>>>> ------------------------------------------------------- >>>>> >>>>> -------------------------------------------------------------------------- >>>>> mpirun noticed that process rank 1 with PID 0 on node titan01 exited >>>>> on signal 6 (Aborted). >>>>> >>>>> >>>>> ########CONFIGURATION: >>>>> I used the ompi master sources from github: >>>>> commit 267821f0dd405b5f4370017a287d9a49f92e734a >>>>> Author: Gilles Gouaillardet <gil...@rist.or.jp >>>>> <javascript:_e(%7B%7D,'cvml','gil...@rist.or.jp');>> >>>>> Date: Tue Jul 5 13:47:50 2016 +0900 >>>>> >>>>> ./configure --enable-mpi-java >>>>> --with-jdk-dir=/home/gl069/bin/jdk1.7.0_25 --disable-dlopen >>>>> --disable-mca-dso >>>>> >>>>> Thanks a lot for your help! >>>>> Gundram >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: <https://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>> https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> <http://www.open-mpi.org/community/lists/users/2016/07/29584.php> >>>>> http://www.open-mpi.org/community/lists/users/2016/07/29584.php >>>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing listus...@open-mpi.org >>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2016/07/29585.php >>>> >>>> >>>> >>> >>> _______________________________________________ >>> users mailing listus...@open-mpi.org >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/07/29587.php >>> >>> >>> >> >> _______________________________________________ >> users mailing listus...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/07/29589.php >> >> >> >> >> _______________________________________________ >> users mailing listus...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/07/29590.php >> >> >> >> >> _______________________________________________ >> users mailing listus...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/07/29592.php >> >> >> >> >> _______________________________________________ >> users mailing listus...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/07/29593.php >> >> >> > > _______________________________________________ > users mailing listus...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/07/29601.php > > >