Hi Gundram

Could you configure without the disable dlopen option and retry?

Howard

Am Freitag, 8. Juli 2016 schrieb Gilles Gouaillardet :

> the JVM sets its own signal handlers, and it is important openmpi dones
> not override them.
> this is what previously happened with PSM (infinipath) but this has been
> solved since.
> you might be linking with a third party library that hijacks signal
> handlers and cause the crash
> (which would explain why I cannot reproduce the issue)
>
> the master branch has a revamped memory patcher (compared to v2.x or
> v1.10), and that could have some bad interactions with the JVM, so you
> might also give v2.x a try
>
> Cheers,
>
> Gilles
>
> On Friday, July 8, 2016, Gundram Leifert <gundram.leif...@uni-rostock.de
> <javascript:_e(%7B%7D,'cvml','gundram.leif...@uni-rostock.de');>> wrote:
>
>> You made the best of it... thanks a lot!
>>
>> Whithout MPI it runs.
>> Just adding MPI.init() causes the crash!
>>
>> maybe I installed something wrong...
>>
>> install newest automake, autoconf, m4, libtoolize in right order and same
>> prefix
>> check out ompi,
>> autogen
>> configure with same prefix, pointing to the same jdk, I later use
>> make
>> make install
>>
>> I will test some different configurations of ./configure...
>>
>>
>> On 07/08/2016 01:40 PM, Gilles Gouaillardet wrote:
>>
>> I am running out of ideas ...
>>
>> what if you do not run within slurm ?
>> what if you do not use '-cp executor.jar'
>> or what if you configure without --disable-dlopen --disable-mca-dso ?
>>
>> if you
>> mpirun -np 1 ...
>> then MPI_Bcast and MPI_Barrier are basically no-op, so it is really weird
>> your program is still crashing. an other test is to comment out MPI_Bcast
>> and MPI_Barrier and try again with -np 1
>>
>> Cheers,
>>
>> Gilles
>>
>> On Friday, July 8, 2016, Gundram Leifert <gundram.leif...@uni-rostock.de>
>> wrote:
>>
>>> In any cases the same error.
>>> this is my code:
>>>
>>> salloc -n 3
>>> export IPATH_NO_BACKTRACE
>>> ulimit -s 10240
>>> mpirun -np 3 java -cp executor.jar
>>> de.uros.citlab.executor.test.TestSendBigFiles2
>>>
>>>
>>> also for 1 or two cores, the process crashes.
>>>
>>>
>>> On 07/08/2016 12:32 PM, Gilles Gouaillardet wrote:
>>>
>>> you can try
>>> export IPATH_NO_BACKTRACE
>>> before invoking mpirun (that should not be needed though)
>>>
>>> an other test is to
>>> ulimit -s 10240
>>> before invoking mpirun.
>>>
>>> btw, do you use mpirun or srun ?
>>>
>>> can you reproduce the crash with 1 or 2 tasks ?
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On Friday, July 8, 2016, Gundram Leifert <gundram.leif...@uni-rostock.de>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> configure:
>>>> ./configure --enable-mpi-java
>>>> --with-jdk-dir=/home/gl069/bin/jdk1.7.0_25 --disable-dlopen
>>>> --disable-mca-dso
>>>>
>>>>
>>>> 1 node with 3 cores. I use SLURM to allocate one node. I changed --mem,
>>>> but it has no effect.
>>>> salloc -n 3
>>>>
>>>>
>>>> core file size          (blocks, -c) 0
>>>> data seg size           (kbytes, -d) unlimited
>>>> scheduling priority             (-e) 0
>>>> file size               (blocks, -f) unlimited
>>>> pending signals                 (-i) 256564
>>>> max locked memory       (kbytes, -l) unlimited
>>>> max memory size         (kbytes, -m) unlimited
>>>> open files                      (-n) 100000
>>>> pipe size            (512 bytes, -p) 8
>>>> POSIX message queues     (bytes, -q) 819200
>>>> real-time priority              (-r) 0
>>>> stack size              (kbytes, -s) unlimited
>>>> cpu time               (seconds, -t) unlimited
>>>> max user processes              (-u) 4096
>>>> virtual memory          (kbytes, -v) unlimited
>>>> file locks                      (-x) unlimited
>>>>
>>>> uname -a
>>>> Linux titan01.service 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31
>>>> 16:04:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> cat /etc/system-release
>>>> CentOS Linux release 7.2.1511 (Core)
>>>>
>>>> what else do you need?
>>>>
>>>> Cheers, Gundram
>>>>
>>>> On 07/07/2016 10:05 AM, Gilles Gouaillardet wrote:
>>>>
>>>> Gundram,
>>>>
>>>>
>>>> can you please provide more information on your environment :
>>>>
>>>> - configure command line
>>>>
>>>> - OS
>>>>
>>>> - memory available
>>>>
>>>> - ulimit -a
>>>>
>>>> - number of nodes
>>>>
>>>> - number of tasks used
>>>>
>>>> - interconnect used (if any)
>>>>
>>>> - batch manager (if any)
>>>>
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> Gilles
>>>> On 7/7/2016 4:17 PM, Gundram Leifert wrote:
>>>>
>>>> Hello Gilles,
>>>>
>>>> I tried you code and it crashes after 3-15 iterations (see (1)). It is
>>>> always the same error (only the "94" varies).
>>>>
>>>> Meanwhile I think Java and MPI use the same memory because when I
>>>> delete the hash-call, the program runs sometimes more than 9k iterations.
>>>> When it crashes, there are different lines (see (2) and (3)). The
>>>> crashes also occurs on rank 0.
>>>>
>>>> ##### (1)#####
>>>> # Problematic frame:
>>>> # J 94 C2 de.uros.citlab.executor.test.TestSendBigFiles2.hashcode([BI)I
>>>> (42 bytes) @ 0x00002b03242dc9c4 [0x00002b03242dc860+0x164]
>>>>
>>>> #####(2)#####
>>>> # Problematic frame:
>>>> # V  [libjvm.so+0x68d0f6]
>>>> JavaCallWrapper::JavaCallWrapper(methodHandle, Handle, JavaValue*,
>>>> Thread*)+0xb6
>>>>
>>>> #####(3)#####
>>>> # Problematic frame:
>>>> # V  [libjvm.so+0x4183bf]
>>>> ThreadInVMfromNative::ThreadInVMfromNative(JavaThread*)+0x4f
>>>>
>>>> Any more idea?
>>>>
>>>> On 07/07/2016 03:00 AM, Gilles Gouaillardet wrote:
>>>>
>>>> Gundram,
>>>>
>>>>
>>>> fwiw, i cannot reproduce the issue on my box
>>>>
>>>> - centos 7
>>>>
>>>> - java version "1.8.0_71"
>>>>   Java(TM) SE Runtime Environment (build 1.8.0_71-b15)
>>>>   Java HotSpot(TM) 64-Bit Server VM (build 25.71-b15, mixed mode)
>>>>
>>>>
>>>> i noticed on non zero rank saveMem is allocated at each iteration.
>>>> ideally, the garbage collector can take care of that and this should
>>>> not be an issue.
>>>>
>>>> would you mind giving the attached file a try ?
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>> On 7/7/2016 7:41 AM, Gilles Gouaillardet wrote:
>>>>
>>>> I will have a look at it today
>>>>
>>>> how did you configure OpenMPI ?
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>> On Thursday, July 7, 2016, Gundram Leifert <
>>>> gundram.leif...@uni-rostock.de> wrote:
>>>>
>>>>> Hello Giles,
>>>>>
>>>>> thank you for your hints! I did 3 changes, unfortunately the same
>>>>> error occures:
>>>>>
>>>>> update ompi:
>>>>> commit ae8444682f0a7aa158caea08800542ce9874455e
>>>>> Author: Ralph Castain <r...@open-mpi.org>
>>>>> Date:   Tue Jul 5 20:07:16 2016 -0700
>>>>>
>>>>> update java:
>>>>> java version "1.8.0_92"
>>>>> Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
>>>>> Java HotSpot(TM) Server VM (build 25.92-b14, mixed mode)
>>>>>
>>>>> delete hashcode-lines.
>>>>>
>>>>> Now I get this error message - to 100%, after different number of
>>>>> iterations (15-300):
>>>>>
>>>>>  0/ 3:length = 100000000
>>>>>  0/ 3:bcast length done (length = 100000000)
>>>>>  1/ 3:bcast length done (length = 100000000)
>>>>>  2/ 3:bcast length done (length = 100000000)
>>>>> #
>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>> #
>>>>> #  SIGSEGV (0xb) at pc=0x00002b3d022fcd24, pid=16578,
>>>>> tid=0x00002b3d29716700
>>>>> #
>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0_92-b14) (build
>>>>> 1.8.0_92-b14)
>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode
>>>>> linux-amd64 compressed oops)
>>>>> # Problematic frame:
>>>>> # V  [libjvm.so+0x414d24]  ciEnv::get_field_by_index(ciInstanceKlass*,
>>>>> int)+0x94
>>>>> #
>>>>> # Failed to write core dump. Core dumps have been disabled. To enable
>>>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>>>> #
>>>>> # An error report file with more information is saved as:
>>>>> # /home/gl069/ompi/bin/executor/hs_err_pid16578.log
>>>>> #
>>>>> # Compiler replay data is saved as:
>>>>> # /home/gl069/ompi/bin/executor/replay_pid16578.log
>>>>> #
>>>>> # If you would like to submit a bug report, please visit:
>>>>> #   http://bugreport.java.com/bugreport/crash.jsp
>>>>> #
>>>>> [titan01:16578] *** Process received signal ***
>>>>> [titan01:16578] Signal: Aborted (6)
>>>>> [titan01:16578] Signal code:  (-6)
>>>>> [titan01:16578] [ 0]
>>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2b3d01500100]
>>>>> [titan01:16578] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b3d01b5c5f7]
>>>>> [titan01:16578] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b3d01b5dce8]
>>>>> [titan01:16578] [ 3]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91e605)[0x2b3d02806605]
>>>>> [titan01:16578] [ 4]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0xabda63)[0x2b3d029a5a63]
>>>>> [titan01:16578] [ 5]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x14f)[0x2b3d0280be2f]
>>>>> [titan01:16578] [ 6]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91a5c3)[0x2b3d028025c3]
>>>>> [titan01:16578] [ 7] /usr/lib64/libc.so.6(+0x35670)[0x2b3d01b5c670]
>>>>> [titan01:16578] [ 8]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x414d24)[0x2b3d022fcd24]
>>>>> [titan01:16578] [ 9]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x43c5ae)[0x2b3d023245ae]
>>>>> [titan01:16578] [10]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x369ade)[0x2b3d02251ade]
>>>>> [titan01:16578] [11]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36eda0)[0x2b3d02256da0]
>>>>> [titan01:16578] [12]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37091b)[0x2b3d0225891b]
>>>>> [titan01:16578] [13]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3712b6)[0x2b3d022592b6]
>>>>> [titan01:16578] [14]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36d2cf)[0x2b3d022552cf]
>>>>> [titan01:16578] [15]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36e412)[0x2b3d02256412]
>>>>> [titan01:16578] [16]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36ed8d)[0x2b3d02256d8d]
>>>>> [titan01:16578] [17]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37091b)[0x2b3d0225891b]
>>>>> [titan01:16578] [18]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3712b6)[0x2b3d022592b6]
>>>>> [titan01:16578] [19]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36d2cf)[0x2b3d022552cf]
>>>>> [titan01:16578] [20]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36e412)[0x2b3d02256412]
>>>>> [titan01:16578] [21]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36ed8d)[0x2b3d02256d8d]
>>>>> [titan01:16578] [22]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3708c2)[0x2b3d022588c2]
>>>>> [titan01:16578] [23]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3724e7)[0x2b3d0225a4e7]
>>>>> [titan01:16578] [24]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37a817)[0x2b3d02262817]
>>>>> [titan01:16578] [25]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37a92f)[0x2b3d0226292f]
>>>>> [titan01:16578] [26]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x358edb)[0x2b3d02240edb]
>>>>> [titan01:16578] [27]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x35929e)[0x2b3d0224129e]
>>>>> [titan01:16578] [28]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3593ce)[0x2b3d022413ce]
>>>>> [titan01:16578] [29]
>>>>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x35973e)[0x2b3d0224173e]
>>>>> [titan01:16578] *** End of error message ***
>>>>> -------------------------------------------------------
>>>>> Primary job  terminated normally, but 1 process returned
>>>>> a non-zero exit code. Per user-direction, the job has been aborted.
>>>>> -------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> mpirun noticed that process rank 2 with PID 0 on node titan01 exited
>>>>> on signal 6 (Aborted).
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> I don't know if it is a  problem of java or ompi - but the last years,
>>>>> java worked with no problems on my machine...
>>>>>
>>>>> Thank you for your tips in advance!
>>>>> Gundram
>>>>>
>>>>> On 07/06/2016 03:10 PM, Gilles Gouaillardet wrote:
>>>>>
>>>>> Note a race condition in MPI_Init has been fixed yesterday in the
>>>>> master.
>>>>> can you please update your OpenMPI and try again ?
>>>>>
>>>>> hopefully the hang will disappear.
>>>>>
>>>>> Can you reproduce the crash with a simpler (and ideally deterministic)
>>>>> version of your program.
>>>>> the crash occurs in hashcode, and this makes little sense to me. can
>>>>> you also update your jdk ?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Gilles
>>>>>
>>>>> On Wednesday, July 6, 2016, Gundram Leifert <
>>>>> gundram.leif...@uni-rostock.de> wrote:
>>>>>
>>>>>> Hello Jason,
>>>>>>
>>>>>> thanks for your response! I thing it is another problem. I try to
>>>>>> send 100MB bytes. So there are not many tries (between 10 and 30). I
>>>>>> realized that the execution of this code can result 3 different errors:
>>>>>>
>>>>>> 1. most often the posted error message occures.
>>>>>>
>>>>>> 2. in <10% the cases i have a live lock. I can see 3 java-processes,
>>>>>> one with 200% and two with 100% processor utilization. After ~15 minutes
>>>>>> without new system outputs this error occurs.
>>>>>>
>>>>>>
>>>>>> [thread 47499823949568 also had an error]
>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>> #
>>>>>> #  Internal Error (safepoint.cpp:317), pid=24256, tid=47500347131648
>>>>>> #  guarantee(PageArmed == 0) failed: invariant
>>>>>> #
>>>>>> # JRE version: 7.0_25-b15
>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode
>>>>>> linux-amd64 compressed oops)
>>>>>> # Failed to write core dump. Core dumps have been disabled. To enable
>>>>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>>>>> #
>>>>>> # An error report file with more information is saved as:
>>>>>> # /home/gl069/ompi/bin/executor/hs_err_pid24256.log
>>>>>> #
>>>>>> # If you would like to submit a bug report, please visit:
>>>>>> #   <http://bugreport.sun.com/bugreport/crash.jsp>
>>>>>> http://bugreport.sun.com/bugreport/crash.jsp
>>>>>> #
>>>>>> [titan01:24256] *** Process received signal ***
>>>>>> [titan01:24256] Signal: Aborted (6)
>>>>>> [titan01:24256] Signal code:  (-6)
>>>>>> [titan01:24256] [ 0]
>>>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2b336a324100]
>>>>>> [titan01:24256] [ 1]
>>>>>> /usr/lib64/libc.so.6(gsignal+0x37)[0x2b336a9815f7]
>>>>>> [titan01:24256] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b336a982ce8]
>>>>>> [titan01:24256] [ 3]
>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2b336b44fac5]
>>>>>> [titan01:24256] [ 4]
>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2b336b5af137]
>>>>>> [titan01:24256] [ 5]
>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x407262)[0x2b336b114262]
>>>>>> [titan01:24256] [ 6]
>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x7c6c34)[0x2b336b4d3c34]
>>>>>> [titan01:24256] [ 7]
>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a9c17)[0x2b336b5b6c17]
>>>>>> [titan01:24256] [ 8]
>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8aa2c0)[0x2b336b5b72c0]
>>>>>> [titan01:24256] [ 9]
>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x744270)[0x2b336b451270]
>>>>>> [titan01:24256] [10]
>>>>>> /usr/lib64/libpthread.so.0(+0x7dc5)[0x2b336a31cdc5]
>>>>>> [titan01:24256] [11] /usr/lib64/libc.so.6(clone+0x6d)[0x2b336aa4228d]
>>>>>> [titan01:24256] *** End of error message ***
>>>>>> -------------------------------------------------------
>>>>>> Primary job  terminated normally, but 1 process returned
>>>>>> a non-zero exit code. Per user-direction, the job has been aborted.
>>>>>> -------------------------------------------------------
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> mpirun noticed that process rank 0 with PID 0 on node titan01 exited
>>>>>> on signal 6 (Aborted).
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> 3. in <10% the cases i have a dead lock while MPI.init. This stays
>>>>>> for more than 15 minutes without returning with an error message...
>>>>>>
>>>>>> Can I enable some debug-flags to see what happens on C / OpenMPI side?
>>>>>>
>>>>>> Thanks in advance for your help!
>>>>>> Gundram Leifert
>>>>>>
>>>>>>
>>>>>> On 07/05/2016 06:05 PM, Jason Maldonis wrote:
>>>>>>
>>>>>> After reading your thread looks like it may be related to an issue I
>>>>>> had a few weeks ago (I'm a novice though). Maybe my thread will be of 
>>>>>> help:
>>>>>>   <https://www.open-mpi.org/community/lists/users/2016/06/29425.php>
>>>>>> https://www.open-mpi.org/community/lists/users/2016/06/29425.php
>>>>>>
>>>>>> When you say "After a specific number of repetitions the process
>>>>>> either hangs up or returns with a SIGSEGV."  does you mean that a single
>>>>>> call hangs, or that at some point during the for loop a call hangs? If 
>>>>>> you
>>>>>> mean the latter, then it might relate to my issue. Otherwise my thread
>>>>>> probably won't be helpful.
>>>>>>
>>>>>> Jason Maldonis
>>>>>> Research Assistant of Professor Paul Voyles
>>>>>> Materials Science Grad Student
>>>>>> University of Wisconsin, Madison
>>>>>> 1509 University Ave, Rm M142
>>>>>> Madison, WI 53706
>>>>>> maldo...@wisc.edu
>>>>>> 608-295-5532
>>>>>>
>>>>>> On Tue, Jul 5, 2016 at 9:58 AM, Gundram Leifert <
>>>>>> gundram.leif...@uni-rostock.de> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I try to send many byte-arrays via broadcast. After a specific
>>>>>>> number of repetitions the process either hangs up or returns with a
>>>>>>> SIGSEGV. Does any one can help me solving the problem:
>>>>>>>
>>>>>>> ########## The code:
>>>>>>>
>>>>>>> import java.util.Random;
>>>>>>> import mpi.*;
>>>>>>>
>>>>>>> public class TestSendBigFiles {
>>>>>>>
>>>>>>>     public static void log(String msg) {
>>>>>>>         try {
>>>>>>>             System.err.println(String.format("%2d/%2d:%s",
>>>>>>> MPI.COMM_WORLD.getRank(), MPI.COMM_WORLD.getSize(), msg));
>>>>>>>         } catch (MPIException ex) {
>>>>>>>             System.err.println(String.format("%2s/%2s:%s", "?", "?",
>>>>>>> msg));
>>>>>>>         }
>>>>>>>     }
>>>>>>>
>>>>>>>     private static int hashcode(byte[] bytearray) {
>>>>>>>         if (bytearray == null) {
>>>>>>>             return 0;
>>>>>>>         }
>>>>>>>         int hash = 39;
>>>>>>>         for (int i = 0; i < bytearray.length; i++) {
>>>>>>>             byte b = bytearray[i];
>>>>>>>             hash = hash * 7 + (int) b;
>>>>>>>         }
>>>>>>>         return hash;
>>>>>>>     }
>>>>>>>
>>>>>>>     public static void main(String args[]) throws MPIException {
>>>>>>>         log("start main");
>>>>>>>         MPI.Init(args);
>>>>>>>         try {
>>>>>>>             log("initialized done");
>>>>>>>             byte[] saveMem = new byte[100000000];
>>>>>>>             MPI.COMM_WORLD.barrier();
>>>>>>>             Random r = new Random();
>>>>>>>             r.nextBytes(saveMem);
>>>>>>>             if (MPI.COMM_WORLD.getRank() == 0) {
>>>>>>>                 for (int i = 0; i < 1000; i++) {
>>>>>>>                     saveMem[r.nextInt(saveMem.length)]++;
>>>>>>>                     log("i = " + i);
>>>>>>>                     int[] lengthData = new int[]{saveMem.length};
>>>>>>>                     log("object hash = " + hashcode(saveMem));
>>>>>>>                     log("length = " + lengthData[0]);
>>>>>>>                     MPI.COMM_WORLD.bcast(lengthData, 1, MPI.INT, 0);
>>>>>>>                     log("bcast length done (length = " +
>>>>>>> lengthData[0] + ")");
>>>>>>>                     MPI.COMM_WORLD.barrier();
>>>>>>>                     MPI.COMM_WORLD.bcast(saveMem, lengthData[0],
>>>>>>> MPI.BYTE, 0);
>>>>>>>                     log("bcast data done");
>>>>>>>                     MPI.COMM_WORLD.barrier();
>>>>>>>                 }
>>>>>>>                 MPI.COMM_WORLD.bcast(new int[]{0}, 1, MPI.INT, 0);
>>>>>>>             } else {
>>>>>>>                 while (true) {
>>>>>>>                     int[] lengthData = new int[1];
>>>>>>>                     MPI.COMM_WORLD.bcast(lengthData, 1, MPI.INT, 0);
>>>>>>>                     log("bcast length done (length = " +
>>>>>>> lengthData[0] + ")");
>>>>>>>                     if (lengthData[0] == 0) {
>>>>>>>                         break;
>>>>>>>                     }
>>>>>>>                     MPI.COMM_WORLD.barrier();
>>>>>>>                     saveMem = new byte[lengthData[0]];
>>>>>>>                     MPI.COMM_WORLD.bcast(saveMem, saveMem.length,
>>>>>>> MPI.BYTE, 0);
>>>>>>>                     log("bcast data done");
>>>>>>>                     MPI.COMM_WORLD.barrier();
>>>>>>>                     log("object hash = " + hashcode(saveMem));
>>>>>>>                 }
>>>>>>>             }
>>>>>>>             MPI.COMM_WORLD.barrier();
>>>>>>>         } catch (MPIException ex) {
>>>>>>>             System.out.println("caugth error." + ex);
>>>>>>>             log(ex.getMessage());
>>>>>>>         } catch (RuntimeException ex) {
>>>>>>>             System.out.println("caugth error." + ex);
>>>>>>>             log(ex.getMessage());
>>>>>>>         } finally {
>>>>>>>             MPI.Finalize();
>>>>>>>         }
>>>>>>>
>>>>>>>     }
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> ############ The Error (if it does not just hang up):
>>>>>>>
>>>>>>> #
>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>> #
>>>>>>> #  SIGSEGV (0xb) at pc=0x00002b7e9c86e3a1, pid=1172,
>>>>>>> tid=47822674495232
>>>>>>> #
>>>>>>> #
>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>> # JRE version: 7.0_25-b15
>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode
>>>>>>> linux-amd64 compressed oops)
>>>>>>> # Problematic frame:
>>>>>>> # #
>>>>>>> #  SIGSEGV (0xb) at pc=0x00002af69c0693a1, pid=1173,
>>>>>>> tid=47238546896640
>>>>>>> #
>>>>>>> # JRE version: 7.0_25-b15
>>>>>>> J  de.uros.citlab.executor.test.TestSendBigFiles.hashcode([B)I
>>>>>>> #
>>>>>>> # Failed to write core dump. Core dumps have been disabled. To
>>>>>>> enable core dumping, try "ulimit -c unlimited" before starting Java 
>>>>>>> again
>>>>>>> #
>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode
>>>>>>> linux-amd64 compressed oops)
>>>>>>> # Problematic frame:
>>>>>>> # J  de.uros.citlab.executor.test.TestSendBigFiles.hashcode([B)I
>>>>>>> #
>>>>>>> # Failed to write core dump. Core dumps have been disabled. To
>>>>>>> enable core dumping, try "ulimit -c unlimited" before starting Java 
>>>>>>> again
>>>>>>> #
>>>>>>> # An error report file with more information is saved as:
>>>>>>> # /home/gl069/ompi/bin/executor/hs_err_pid1172.log
>>>>>>> # An error report file with more information is saved as:
>>>>>>> # /home/gl069/ompi/bin/executor/hs_err_pid1173.log
>>>>>>> #
>>>>>>> # If you would like to submit a bug report, please visit:
>>>>>>> #    <http://bugreport.sun.com/bugreport/crash.jsp>
>>>>>>> http://bugreport.sun.com/bugreport/crash.jsp
>>>>>>> #
>>>>>>> #
>>>>>>> # If you would like to submit a bug report, please visit:
>>>>>>> #    <http://bugreport.sun.com/bugreport/crash.jsp>
>>>>>>> http://bugreport.sun.com/bugreport/crash.jsp
>>>>>>> #
>>>>>>> [titan01:01172] *** Process received signal ***
>>>>>>> [titan01:01172] Signal: Aborted (6)
>>>>>>> [titan01:01172] Signal code:  (-6)
>>>>>>> [titan01:01173] *** Process received signal ***
>>>>>>> [titan01:01173] Signal: Aborted (6)
>>>>>>> [titan01:01173] Signal code:  (-6)
>>>>>>> [titan01:01172] [ 0]
>>>>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2b7e9596a100]
>>>>>>> [titan01:01172] [ 1]
>>>>>>> /usr/lib64/libc.so.6(gsignal+0x37)[0x2b7e95fc75f7]
>>>>>>> [titan01:01172] [ 2]
>>>>>>> /usr/lib64/libc.so.6(abort+0x148)[0x2b7e95fc8ce8]
>>>>>>> [titan01:01172] [ 3]
>>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2b7e96a95ac5]
>>>>>>> [titan01:01172] [ 4]
>>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2b7e96bf5137]
>>>>>>> [titan01:01172] [ 5]
>>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x140)[0x2b7e96a995e0]
>>>>>>> [titan01:01172] [ 6] [titan01:01173] [ 0]
>>>>>>> /usr/lib64/libpthread.so.0(+0xf100)[0x2af694ded100]
>>>>>>> [titan01:01173] [ 1] /usr/lib64/libc.so.6(+0x35670)[0x2b7e95fc7670]
>>>>>>> [titan01:01172] [ 7] [0x2b7e9c86e3a1]
>>>>>>> [titan01:01172] *** End of error message ***
>>>>>>> /usr/lib64/libc.so.6(gsignal+0x37)[0x2af69544a5f7]
>>>>>>> [titan01:01173] [ 2]
>>>>>>> /usr/lib64/libc.so.6(abort+0x148)[0x2af69544bce8]
>>>>>>> [titan01:01173] [ 3]
>>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2af695f18ac5]
>>>>>>> [titan01:01173] [ 4]
>>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2af696078137]
>>>>>>> [titan01:01173] [ 5]
>>>>>>> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x140)[0x2af695f1c5e0]
>>>>>>> [titan01:01173] [ 6] /usr/lib64/libc.so.6(+0x35670)[0x2af69544a670]
>>>>>>> [titan01:01173] [ 7] [0x2af69c0693a1]
>>>>>>> [titan01:01173] *** End of error message ***
>>>>>>> -------------------------------------------------------
>>>>>>> Primary job  terminated normally, but 1 process returned
>>>>>>> a non-zero exit code. Per user-direction, the job has been aborted.
>>>>>>> -------------------------------------------------------
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>> mpirun noticed that process rank 1 with PID 0 on node titan01 exited
>>>>>>> on signal 6 (Aborted).
>>>>>>>
>>>>>>>
>>>>>>> ########CONFIGURATION:
>>>>>>> I used the ompi master sources from github:
>>>>>>> commit 267821f0dd405b5f4370017a287d9a49f92e734a
>>>>>>> Author: Gilles Gouaillardet <gil...@rist.or.jp>
>>>>>>> Date:   Tue Jul 5 13:47:50 2016 +0900
>>>>>>>
>>>>>>> ./configure --enable-mpi-java
>>>>>>> --with-jdk-dir=/home/gl069/bin/jdk1.7.0_25 --disable-dlopen
>>>>>>> --disable-mca-dso
>>>>>>>
>>>>>>> Thanks a lot for your help!
>>>>>>> Gundram
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: <https://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>>>> https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post:
>>>>>>> <http://www.open-mpi.org/community/lists/users/2016/07/29584.php>
>>>>>>> http://www.open-mpi.org/community/lists/users/2016/07/29584.php
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing listus...@open-mpi.org
>>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2016/07/29585.php
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing listus...@open-mpi.org
>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2016/07/29587.php
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing listus...@open-mpi.org
>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2016/07/29589.php
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing listus...@open-mpi.org
>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2016/07/29590.php
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing listus...@open-mpi.org
>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2016/07/29592.php
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing listus...@open-mpi.org
>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2016/07/29593.php
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> users mailing listus...@open-mpi.org
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2016/07/29601.php
>>>
>>>
>>>
>>
>> _______________________________________________
>> users mailing listus...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/07/29603.php
>>
>>
>>

Reply via email to