I am running out of ideas ...

what if you do not run within slurm ?
what if you do not use '-cp executor.jar'
or what if you configure without --disable-dlopen --disable-mca-dso ?

if you
mpirun -np 1 ...
then MPI_Bcast and MPI_Barrier are basically no-op, so it is really weird
your program is still crashing. an other test is to comment out MPI_Bcast
and MPI_Barrier and try again with -np 1

Cheers,

Gilles

On Friday, July 8, 2016, Gundram Leifert <gundram.leif...@uni-rostock.de>
wrote:

> In any cases the same error.
> this is my code:
>
> salloc -n 3
> export IPATH_NO_BACKTRACE
> ulimit -s 10240
> mpirun -np 3 java -cp executor.jar
> de.uros.citlab.executor.test.TestSendBigFiles2
>
>
> also for 1 or two cores, the process crashes.
>
>
> On 07/08/2016 12:32 PM, Gilles Gouaillardet wrote:
>
> you can try
> export IPATH_NO_BACKTRACE
> before invoking mpirun (that should not be needed though)
>
> an other test is to
> ulimit -s 10240
> before invoking mpirun.
>
> btw, do you use mpirun or srun ?
>
> can you reproduce the crash with 1 or 2 tasks ?
>
> Cheers,
>
> Gilles
>
> On Friday, July 8, 2016, Gundram Leifert <
> <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');>
> gundram.leif...@uni-rostock.de
> <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');>> wrote:
>
>> Hello,
>>
>> configure:
>> ./configure --enable-mpi-java --with-jdk-dir=/home/gl069/bin/jdk1.7.0_25
>> --disable-dlopen --disable-mca-dso
>>
>>
>> 1 node with 3 cores. I use SLURM to allocate one node. I changed --mem,
>> but it has no effect.
>> salloc -n 3
>>
>>
>> core file size          (blocks, -c) 0
>> data seg size           (kbytes, -d) unlimited
>> scheduling priority             (-e) 0
>> file size               (blocks, -f) unlimited
>> pending signals                 (-i) 256564
>> max locked memory       (kbytes, -l) unlimited
>> max memory size         (kbytes, -m) unlimited
>> open files                      (-n) 100000
>> pipe size            (512 bytes, -p) 8
>> POSIX message queues     (bytes, -q) 819200
>> real-time priority              (-r) 0
>> stack size              (kbytes, -s) unlimited
>> cpu time               (seconds, -t) unlimited
>> max user processes              (-u) 4096
>> virtual memory          (kbytes, -v) unlimited
>> file locks                      (-x) unlimited
>>
>> uname -a
>> Linux titan01.service 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31
>> 16:04:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>>
>> cat /etc/system-release
>> CentOS Linux release 7.2.1511 (Core)
>>
>> what else do you need?
>>
>> Cheers, Gundram
>>
>> On 07/07/2016 10:05 AM, Gilles Gouaillardet wrote:
>>
>> Gundram,
>>
>>
>> can you please provide more information on your environment :
>>
>> - configure command line
>>
>> - OS
>>
>> - memory available
>>
>> - ulimit -a
>>
>> - number of nodes
>>
>> - number of tasks used
>>
>> - interconnect used (if any)
>>
>> - batch manager (if any)
>>
>>
>> Cheers,
>>
>>
>> Gilles
>> On 7/7/2016 4:17 PM, Gundram Leifert wrote:
>>
>> Hello Gilles,
>>
>> I tried you code and it crashes after 3-15 iterations (see (1)). It is
>> always the same error (only the "94" varies).
>>
>> Meanwhile I think Java and MPI use the same memory because when I delete
>> the hash-call, the program runs sometimes more than 9k iterations.
>> When it crashes, there are different lines (see (2) and (3)). The crashes
>> also occurs on rank 0.
>>
>> ##### (1)#####
>> # Problematic frame:
>> # J 94 C2 de.uros.citlab.executor.test.TestSendBigFiles2.hashcode([BI)I
>> (42 bytes) @ 0x00002b03242dc9c4 [0x00002b03242dc860+0x164]
>>
>> #####(2)#####
>> # Problematic frame:
>> # V  [libjvm.so+0x68d0f6]  JavaCallWrapper::JavaCallWrapper(methodHandle,
>> Handle, JavaValue*, Thread*)+0xb6
>>
>> #####(3)#####
>> # Problematic frame:
>> # V  [libjvm.so+0x4183bf]
>> ThreadInVMfromNative::ThreadInVMfromNative(JavaThread*)+0x4f
>>
>> Any more idea?
>>
>> On 07/07/2016 03:00 AM, Gilles Gouaillardet wrote:
>>
>> Gundram,
>>
>>
>> fwiw, i cannot reproduce the issue on my box
>>
>> - centos 7
>>
>> - java version "1.8.0_71"
>>   Java(TM) SE Runtime Environment (build 1.8.0_71-b15)
>>   Java HotSpot(TM) 64-Bit Server VM (build 25.71-b15, mixed mode)
>>
>>
>> i noticed on non zero rank saveMem is allocated at each iteration.
>> ideally, the garbage collector can take care of that and this should not
>> be an issue.
>>
>> would you mind giving the attached file a try ?
>>
>> Cheers,
>>
>> Gilles
>>
>> On 7/7/2016 7:41 AM, Gilles Gouaillardet wrote:
>>
>> I will have a look at it today
>>
>> how did you configure OpenMPI ?
>>
>> Cheers,
>>
>> Gilles
>>
>> On Thursday, July 7, 2016, Gundram Leifert <
>> gundram.leif...@uni-rostock.de
>> <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');>> wrote:
>>
>>> Hello Giles,
>>>
>>> thank you for your hints! I did 3 changes, unfortunately the same error
>>> occures:
>>>
>>> update ompi:
>>> commit ae8444682f0a7aa158caea08800542ce9874455e
>>> Author: Ralph Castain <r...@open-mpi.org>
>>> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>
>>> Date:   Tue Jul 5 20:07:16 2016 -0700
>>>
>>> update java:
>>> java version "1.8.0_92"
>>> Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
>>> Java HotSpot(TM) Server VM (build 25.92-b14, mixed mode)
>>>
>>> delete hashcode-lines.
>>>
>>> Now I get this error message - to 100%, after different number of
>>> iterations (15-300):
>>>
>>>  0/ 3:length = 100000000
>>>  0/ 3:bcast length done (length = 100000000)
>>>  1/ 3:bcast length done (length = 100000000)
>>>  2/ 3:bcast length done (length = 100000000)
>>> #
>>> # A fatal error has been detected by the Java Runtime Environment:
>>> #
>>> #  SIGSEGV (0xb) at pc=0x00002b3d022fcd24, pid=16578,
>>> tid=0x00002b3d29716700
>>> #
>>> # JRE version: Java(TM) SE Runtime Environment (8.0_92-b14) (build
>>> 1.8.0_92-b14)
>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode
>>> linux-amd64 compressed oops)
>>> # Problematic frame:
>>> # V  [libjvm.so+0x414d24]  ciEnv::get_field_by_index(ciInstanceKlass*,
>>> int)+0x94
>>> #
>>> # Failed to write core dump. Core dumps have been disabled. To enable
>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>> #
>>> # An error report file with more information is saved as:
>>> # /home/gl069/ompi/bin/executor/hs_err_pid16578.log
>>> #
>>> # Compiler replay data is saved as:
>>> # /home/gl069/ompi/bin/executor/replay_pid16578.log
>>> #
>>> # If you would like to submit a bug report, please visit:
>>> #   http://bugreport.java.com/bugreport/crash.jsp
>>> #
>>> [titan01:16578] *** Process received signal ***
>>> [titan01:16578] Signal: Aborted (6)
>>> [titan01:16578] Signal code:  (-6)
>>> [titan01:16578] [ 0] /usr/lib64/libpthread.so.0(+0xf100)[0x2b3d01500100]
>>> [titan01:16578] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b3d01b5c5f7]
>>> [titan01:16578] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b3d01b5dce8]
>>> [titan01:16578] [ 3]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91e605)[0x2b3d02806605]
>>> [titan01:16578] [ 4]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0xabda63)[0x2b3d029a5a63]
>>> [titan01:16578] [ 5]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x14f)[0x2b3d0280be2f]
>>> [titan01:16578] [ 6]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91a5c3)[0x2b3d028025c3]
>>> [titan01:16578] [ 7] /usr/lib64/libc.so.6(+0x35670)[0x2b3d01b5c670]
>>> [titan01:16578] [ 8]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x414d24)[0x2b3d022fcd24]
>>> [titan01:16578] [ 9]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x43c5ae)[0x2b3d023245ae]
>>> [titan01:16578] [10]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x369ade)[0x2b3d02251ade]
>>> [titan01:16578] [11]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36eda0)[0x2b3d02256da0]
>>> [titan01:16578] [12]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37091b)[0x2b3d0225891b]
>>> [titan01:16578] [13]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3712b6)[0x2b3d022592b6]
>>> [titan01:16578] [14]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36d2cf)[0x2b3d022552cf]
>>> [titan01:16578] [15]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36e412)[0x2b3d02256412]
>>> [titan01:16578] [16]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36ed8d)[0x2b3d02256d8d]
>>> [titan01:16578] [17]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37091b)[0x2b3d0225891b]
>>> [titan01:16578] [18]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3712b6)[0x2b3d022592b6]
>>> [titan01:16578] [19]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36d2cf)[0x2b3d022552cf]
>>> [titan01:16578] [20]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36e412)[0x2b3d02256412]
>>> [titan01:16578] [21]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36ed8d)[0x2b3d02256d8d]
>>> [titan01:16578] [22]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3708c2)[0x2b3d022588c2]
>>> [titan01:16578] [23]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3724e7)[0x2b3d0225a4e7]
>>> [titan01:16578] [24]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37a817)[0x2b3d02262817]
>>> [titan01:16578] [25]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37a92f)[0x2b3d0226292f]
>>> [titan01:16578] [26]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x358edb)[0x2b3d02240edb]
>>> [titan01:16578] [27]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x35929e)[0x2b3d0224129e]
>>> [titan01:16578] [28]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3593ce)[0x2b3d022413ce]
>>> [titan01:16578] [29]
>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x35973e)[0x2b3d0224173e]
>>> [titan01:16578] *** End of error message ***
>>> -------------------------------------------------------
>>> Primary job  terminated normally, but 1 process returned
>>> a non-zero exit code. Per user-direction, the job has been aborted.
>>> -------------------------------------------------------
>>>
>>> --------------------------------------------------------------------------
>>> mpirun noticed that process rank 2 with PID 0 on node titan01 exited on
>>> signal 6 (Aborted).
>>>
>>> --------------------------------------------------------------------------
>>>
>>> I don't know if it is a  problem of java or ompi - but the last years,
>>> java worked with no problems on my machine...
>>>
>>> Thank you for your tips in advance!
>>> Gundram
>>>
>>> On 07/06/2016 03:10 PM, Gilles Gouaillardet wrote:
>>>
>>> Note a race condition in MPI_Init has been fixed yesterday in the
>>> master.
>>> can you please update your OpenMPI and try again ?
>>>
>>> hopefully the hang will disappear.
>>>
>>> Can you reproduce the crash with a simpler (and ideally deterministic)
>>> version of your program.
>>> the crash occurs in hashcode, and this makes little sense to me. can you
>>> also update your jdk ?
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On Wednesday, July 6, 2016, Gundram Leifert <
>>> gundram.leif...@uni-rostock.de> wrote:
>>>
>>>> Hello Jason,
>>>>
>>>> thanks for your response! I thing it is another problem. I try to send
>>>> 100MB bytes. So there are not many tries (between 10 and 30). I realized
>>>> that the execution of this code can result 3 different errors:
>>>>
>>>> 1. most often the posted error message occures.
>>>>
>>>> 2. in <10% the cases i have a live lock. I can see 3 java-processes,
>>>> one with 200% and two with 100% processor utilization. After ~15 minutes
>>>> without new system outputs this error occurs.
>>>>
>>>>
>>>> [thread 47499823949568 also had an error]
>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>> #
>>>> #  Internal Error (safepoint.cpp:317), pid=24256, tid=47500347131648
>>>> #  guarantee(PageArmed == 0) failed: invariant
>>>> #
>>>> # JRE version: 7.0_25-b15
>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode
>>>> linux-amd64 compressed oops)
>>>> # Failed to write core dump. Core dumps have been disabled. To enable
>>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>>> #
>>>> # An error report file with more information is saved as:
>>>> # /home/gl069/ompi/bin/executor/hs_err_pid24256.log
>>>> #
>>>> # If you would like to submit a bug report, please visit:
>>>> #   http://bugreport.sun.com/bugreport/crash.jsp
>>>> #
>>>> [titan01:24256] *** Process received signal ***
>>>> [titan01:24256] Signal: Aborted (6)
>>>> [titan01:24256] Signal code:  (-6)
>>>> [titan01:24256] [ 0] /usr/lib64/libpthread.so.0(+0xf100)[0x2b336a324100]
>>>> [titan01:24256] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b336a9815f7]
>>>> [titan01:24256] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b336a982ce8]
>>>> [titan01:24256] [ 3]
>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2b336b44fac5]
>>>> [titan01:24256] [ 4]
>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2b336b5af137]
>>>> [titan01:24256] [ 5]
>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x407262)[0x2b336b114262]
>>>> [titan01:24256] [ 6]
>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x7c6c34)[0x2b336b4d3c34]
>>>> [titan01:24256] [ 7]
>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a9c17)[0x2b336b5b6c17]
>>>> [titan01:24256] [ 8]
>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8aa2c0)[0x2b336b5b72c0]
>>>> [titan01:24256] [ 9]
>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x744270)[0x2b336b451270]
>>>> [titan01:24256] [10] /usr/lib64/libpthread.so.0(+0x7dc5)[0x2b336a31cdc5]
>>>> [titan01:24256] [11] /usr/lib64/libc.so.6(clone+0x6d)[0x2b336aa4228d]
>>>> [titan01:24256] *** End of error message ***
>>>> -------------------------------------------------------
>>>> Primary job  terminated normally, but 1 process returned
>>>> a non-zero exit code. Per user-direction, the job has been aborted.
>>>> -------------------------------------------------------
>>>>
>>>> --------------------------------------------------------------------------
>>>> mpirun noticed that process rank 0 with PID 0 on node titan01 exited on
>>>> signal 6 (Aborted).
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>>
>>>> 3. in <10% the cases i have a dead lock while MPI.init. This stays for
>>>> more than 15 minutes without returning with an error message...
>>>>
>>>> Can I enable some debug-flags to see what happens on C / OpenMPI side?
>>>>
>>>> Thanks in advance for your help!
>>>> Gundram Leifert
>>>>
>>>>
>>>> On 07/05/2016 06:05 PM, Jason Maldonis wrote:
>>>>
>>>> After reading your thread looks like it may be related to an issue I
>>>> had a few weeks ago (I'm a novice though). Maybe my thread will be of help:
>>>>   <https://www.open-mpi.org/community/lists/users/2016/06/29425.php>
>>>> https://www.open-mpi.org/community/lists/users/2016/06/29425.php
>>>>
>>>> When you say "After a specific number of repetitions the process
>>>> either hangs up or returns with a SIGSEGV."  does you mean that a single
>>>> call hangs, or that at some point during the for loop a call hangs? If you
>>>> mean the latter, then it might relate to my issue. Otherwise my thread
>>>> probably won't be helpful.
>>>>
>>>> Jason Maldonis
>>>> Research Assistant of Professor Paul Voyles
>>>> Materials Science Grad Student
>>>> University of Wisconsin, Madison
>>>> 1509 University Ave, Rm M142
>>>> Madison, WI 53706
>>>> maldo...@wisc.edu <javascript:_e(%7B%7D,'cvml','maldo...@wisc.edu');>
>>>> 608-295-5532
>>>>
>>>> On Tue, Jul 5, 2016 at 9:58 AM, Gundram Leifert <
>>>> gundram.leif...@uni-rostock.de
>>>> <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');>>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I try to send many byte-arrays via broadcast. After a specific number
>>>>> of repetitions the process either hangs up or returns with a SIGSEGV. Does
>>>>> any one can help me solving the problem:
>>>>>
>>>>> ########## The code:
>>>>>
>>>>> import java.util.Random;
>>>>> import mpi.*;
>>>>>
>>>>> public class TestSendBigFiles {
>>>>>
>>>>>     public static void log(String msg) {
>>>>>         try {
>>>>>             System.err.println(String.format("%2d/%2d:%s",
>>>>> MPI.COMM_WORLD.getRank(), MPI.COMM_WORLD.getSize(), msg));
>>>>>         } catch (MPIException ex) {
>>>>>             System.err.println(String.format("%2s/%2s:%s", "?", "?",
>>>>> msg));
>>>>>         }
>>>>>     }
>>>>>
>>>>>     private static int hashcode(byte[] bytearray) {
>>>>>         if (bytearray == null) {
>>>>>             return 0;
>>>>>         }
>>>>>         int hash = 39;
>>>>>         for (int i = 0; i < bytearray.length; i++) {
>>>>>             byte b = bytearray[i];
>>>>>             hash = hash * 7 + (int) b;
>>>>>         }
>>>>>         return hash;
>>>>>     }
>>>>>
>>>>>     public static void main(String args[]) throws MPIException {
>>>>>         log("start main");
>>>>>         MPI.Init(args);
>>>>>         try {
>>>>>             log("initialized done");
>>>>>             byte[] saveMem = new byte[100000000];
>>>>>             MPI.COMM_WORLD.barrier();
>>>>>             Random r = new Random();
>>>>>             r.nextBytes(saveMem);
>>>>>             if (MPI.COMM_WORLD.getRank() == 0) {
>>>>>                 for (int i = 0; i < 1000; i++) {
>>>>>                     saveMem[r.nextInt(saveMem.length)]++;
>>>>>                     log("i = " + i);
>>>>>                     int[] lengthData = new int[]{saveMem.length};
>>>>>                     log("object hash = " + hashcode(saveMem));
>>>>>                     log("length = " + lengthData[0]);
>>>>>                     MPI.COMM_WORLD.bcast(lengthData, 1, MPI.INT, 0);
>>>>>                     log("bcast length done (length = " + lengthData[0]
>>>>> + ")");
>>>>>                     MPI.COMM_WORLD.barrier();
>>>>>                     MPI.COMM_WORLD.bcast(saveMem, lengthData[0],
>>>>> MPI.BYTE, 0);
>>>>>                     log("bcast data done");
>>>>>                     MPI.COMM_WORLD.barrier();
>>>>>                 }
>>>>>                 MPI.COMM_WORLD.bcast(new int[]{0}, 1, MPI.INT, 0);
>>>>>             } else {
>>>>>                 while (true) {
>>>>>                     int[] lengthData = new int[1];
>>>>>                     MPI.COMM_WORLD.bcast(lengthData, 1, MPI.INT, 0);
>>>>>                     log("bcast length done (length = " + lengthData[0]
>>>>> + ")");
>>>>>                     if (lengthData[0] == 0) {
>>>>>                         break;
>>>>>                     }
>>>>>                     MPI.COMM_WORLD.barrier();
>>>>>                     saveMem = new byte[lengthData[0]];
>>>>>                     MPI.COMM_WORLD.bcast(saveMem, saveMem.length,
>>>>> MPI.BYTE, 0);
>>>>>                     log("bcast data done");
>>>>>                     MPI.COMM_WORLD.barrier();
>>>>>                     log("object hash = " + hashcode(saveMem));
>>>>>                 }
>>>>>             }
>>>>>             MPI.COMM_WORLD.barrier();
>>>>>         } catch (MPIException ex) {
>>>>>             System.out.println("caugth error." + ex);
>>>>>             log(ex.getMessage());
>>>>>         } catch (RuntimeException ex) {
>>>>>             System.out.println("caugth error." + ex);
>>>>>             log(ex.getMessage());
>>>>>         } finally {
>>>>>             MPI.Finalize();
>>>>>         }
>>>>>
>>>>>     }
>>>>>
>>>>> }
>>>>>
>>>>>
>>>>> ############ The Error (if it does not just hang up):
>>>>>
>>>>> #
>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>> #
>>>>> #  SIGSEGV (0xb) at pc=0x00002b7e9c86e3a1, pid=1172, tid=47822674495232
>>>>> #
>>>>> #
>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>> # JRE version: 7.0_25-b15
>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode
>>>>> linux-amd64 compressed oops)
>>>>> # Problematic frame:
>>>>> # #
>>>>> #  SIGSEGV (0xb) at pc=0x00002af69c0693a1, pid=1173, tid=47238546896640
>>>>> #
>>>>> # JRE version: 7.0_25-b15
>>>>> J  de.uros.citlab.executor.test.TestSendBigFiles.hashcode([B)I
>>>>> #
>>>>> # Failed to write core dump. Core dumps have been disabled. To enable
>>>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>>>> #
>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode
>>>>> linux-amd64 compressed oops)
>>>>> # Problematic frame:
>>>>> # J  de.uros.citlab.executor.test.TestSendBigFiles.hashcode([B)I
>>>>> #
>>>>> # Failed to write core dump. Core dumps have been disabled. To enable
>>>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>>>> #
>>>>> # An error report file with more information is saved as:
>>>>> # /home/gl069/ompi/bin/executor/hs_err_pid1172.log
>>>>> # An error report file with more information is saved as:
>>>>> # /home/gl069/ompi/bin/executor/hs_err_pid1173.log
>>>>> #
>>>>> # If you would like to submit a bug report, please visit:
>>>>> #    <http://bugreport.sun.com/bugreport/crash.jsp>
>>>>> http://bugreport.sun.com/bugreport/crash.jsp
>>>>> #
>>>>> #
>>>>> # If you would like to submit a bug report, please visit:
>>>>> #    <http://bugreport.sun.com/bugreport/crash.jsp>
>>>>> http://bugreport.sun.com/bugreport/crash.jsp
>>>>> #
>>>>> [titan01:01172] *** Process received signal ***
>>>>> [titan01:01172] Signal: Aborted (6)
>>>>> [titan01:01172] Signal code:  (-6)
>>>>> [titan01:01173] *** Process received signal ***
>>>>> [titan01:01173] Signal: Aborted (6)
>>>>> [titan01:01173] Signal code:  (-6)
>>>>> [titan01:01172] [ 0]
>>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2b7e9596a100]
>>>>> [titan01:01172] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b7e95fc75f7]
>>>>> [titan01:01172] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b7e95fc8ce8]
>>>>> [titan01:01172] [ 3]
>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2b7e96a95ac5]
>>>>> [titan01:01172] [ 4]
>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2b7e96bf5137]
>>>>> [titan01:01172] [ 5]
>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x140)[0x2b7e96a995e0]
>>>>> [titan01:01172] [ 6] [titan01:01173] [ 0]
>>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2af694ded100]
>>>>> [titan01:01173] [ 1] /usr/lib64/libc.so.6(+0x35670)[0x2b7e95fc7670]
>>>>> [titan01:01172] [ 7] [0x2b7e9c86e3a1]
>>>>> [titan01:01172] *** End of error message ***
>>>>> /usr/lib64/libc.so.6(gsignal+0x37)[0x2af69544a5f7]
>>>>> [titan01:01173] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2af69544bce8]
>>>>> [titan01:01173] [ 3]
>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2af695f18ac5]
>>>>> [titan01:01173] [ 4]
>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2af696078137]
>>>>> [titan01:01173] [ 5]
>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x140)[0x2af695f1c5e0]
>>>>> [titan01:01173] [ 6] /usr/lib64/libc.so.6(+0x35670)[0x2af69544a670]
>>>>> [titan01:01173] [ 7] [0x2af69c0693a1]
>>>>> [titan01:01173] *** End of error message ***
>>>>> -------------------------------------------------------
>>>>> Primary job  terminated normally, but 1 process returned
>>>>> a non-zero exit code. Per user-direction, the job has been aborted.
>>>>> -------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> mpirun noticed that process rank 1 with PID 0 on node titan01 exited
>>>>> on signal 6 (Aborted).
>>>>>
>>>>>
>>>>> ########CONFIGURATION:
>>>>> I used the ompi master sources from github:
>>>>> commit 267821f0dd405b5f4370017a287d9a49f92e734a
>>>>> Author: Gilles Gouaillardet <gil...@rist.or.jp
>>>>> <javascript:_e(%7B%7D,'cvml','gil...@rist.or.jp');>>
>>>>> Date:   Tue Jul 5 13:47:50 2016 +0900
>>>>>
>>>>> ./configure --enable-mpi-java
>>>>> --with-jdk-dir=/home/gl069/bin/jdk1.7.0_25 --disable-dlopen
>>>>> --disable-mca-dso
>>>>>
>>>>> Thanks a lot for your help!
>>>>> Gundram
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: <https://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>> https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> <http://www.open-mpi.org/community/lists/users/2016/07/29584.php>
>>>>> http://www.open-mpi.org/community/lists/users/2016/07/29584.php
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing listus...@open-mpi.org
>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2016/07/29585.php
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> users mailing listus...@open-mpi.org
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2016/07/29587.php
>>>
>>>
>>>
>>
>> _______________________________________________
>> users mailing listus...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/07/29589.php
>>
>>
>>
>>
>> _______________________________________________
>> users mailing listus...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/07/29590.php
>>
>>
>>
>>
>> _______________________________________________
>> users mailing listus...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/07/29592.php
>>
>>
>>
>>
>> _______________________________________________
>> users mailing listus...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/07/29593.php
>>
>>
>>
>
> _______________________________________________
> users mailing listus...@open-mpi.org 
> <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/07/29601.php
>
>
>

Reply via email to