After reading your thread looks like it may be related to an issue I had a
few weeks ago (I'm a novice though). Maybe my thread will be of help:
https://www.open-mpi.org/community/lists/users/2016/06/29425.php

When you say "After a specific number of repetitions the process either
hangs up or returns with a SIGSEGV."  does you mean that a single call
hangs, or that at some point during the for loop a call hangs? If you mean
the latter, then it might relate to my issue. Otherwise my thread probably
won't be helpful.

Jason Maldonis
Research Assistant of Professor Paul Voyles
Materials Science Grad Student
University of Wisconsin, Madison
1509 University Ave, Rm M142
Madison, WI 53706
maldo...@wisc.edu
608-295-5532

On Tue, Jul 5, 2016 at 9:58 AM, Gundram Leifert <
gundram.leif...@uni-rostock.de> wrote:

> Hello,
>
> I try to send many byte-arrays via broadcast. After a specific number of
> repetitions the process either hangs up or returns with a SIGSEGV. Does any
> one can help me solving the problem:
>
> ########## The code:
>
> import java.util.Random;
> import mpi.*;
>
> public class TestSendBigFiles {
>
>     public static void log(String msg) {
>         try {
>             System.err.println(String.format("%2d/%2d:%s",
> MPI.COMM_WORLD.getRank(), MPI.COMM_WORLD.getSize(), msg));
>         } catch (MPIException ex) {
>             System.err.println(String.format("%2s/%2s:%s", "?", "?", msg));
>         }
>     }
>
>     private static int hashcode(byte[] bytearray) {
>         if (bytearray == null) {
>             return 0;
>         }
>         int hash = 39;
>         for (int i = 0; i < bytearray.length; i++) {
>             byte b = bytearray[i];
>             hash = hash * 7 + (int) b;
>         }
>         return hash;
>     }
>
>     public static void main(String args[]) throws MPIException {
>         log("start main");
>         MPI.Init(args);
>         try {
>             log("initialized done");
>             byte[] saveMem = new byte[100000000];
>             MPI.COMM_WORLD.barrier();
>             Random r = new Random();
>             r.nextBytes(saveMem);
>             if (MPI.COMM_WORLD.getRank() == 0) {
>                 for (int i = 0; i < 1000; i++) {
>                     saveMem[r.nextInt(saveMem.length)]++;
>                     log("i = " + i);
>                     int[] lengthData = new int[]{saveMem.length};
>                     log("object hash = " + hashcode(saveMem));
>                     log("length = " + lengthData[0]);
>                     MPI.COMM_WORLD.bcast(lengthData, 1, MPI.INT, 0);
>                     log("bcast length done (length = " + lengthData[0] +
> ")");
>                     MPI.COMM_WORLD.barrier();
>                     MPI.COMM_WORLD.bcast(saveMem, lengthData[0], MPI.BYTE,
> 0);
>                     log("bcast data done");
>                     MPI.COMM_WORLD.barrier();
>                 }
>                 MPI.COMM_WORLD.bcast(new int[]{0}, 1, MPI.INT, 0);
>             } else {
>                 while (true) {
>                     int[] lengthData = new int[1];
>                     MPI.COMM_WORLD.bcast(lengthData, 1, MPI.INT, 0);
>                     log("bcast length done (length = " + lengthData[0] +
> ")");
>                     if (lengthData[0] == 0) {
>                         break;
>                     }
>                     MPI.COMM_WORLD.barrier();
>                     saveMem = new byte[lengthData[0]];
>                     MPI.COMM_WORLD.bcast(saveMem, saveMem.length,
> MPI.BYTE, 0);
>                     log("bcast data done");
>                     MPI.COMM_WORLD.barrier();
>                     log("object hash = " + hashcode(saveMem));
>                 }
>             }
>             MPI.COMM_WORLD.barrier();
>         } catch (MPIException ex) {
>             System.out.println("caugth error." + ex);
>             log(ex.getMessage());
>         } catch (RuntimeException ex) {
>             System.out.println("caugth error." + ex);
>             log(ex.getMessage());
>         } finally {
>             MPI.Finalize();
>         }
>
>     }
>
> }
>
>
> ############ The Error (if it does not just hang up):
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00002b7e9c86e3a1, pid=1172, tid=47822674495232
> #
> #
> # A fatal error has been detected by the Java Runtime Environment:
> # JRE version: 7.0_25-b15
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode
> linux-amd64 compressed oops)
> # Problematic frame:
> # #
> #  SIGSEGV (0xb) at pc=0x00002af69c0693a1, pid=1173, tid=47238546896640
> #
> # JRE version: 7.0_25-b15
> J  de.uros.citlab.executor.test.TestSendBigFiles.hashcode([B)I
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode
> linux-amd64 compressed oops)
> # Problematic frame:
> # J  de.uros.citlab.executor.test.TestSendBigFiles.hashcode([B)I
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /home/gl069/ompi/bin/executor/hs_err_pid1172.log
> # An error report file with more information is saved as:
> # /home/gl069/ompi/bin/executor/hs_err_pid1173.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.sun.com/bugreport/crash.jsp
> #
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.sun.com/bugreport/crash.jsp
> #
> [titan01:01172] *** Process received signal ***
> [titan01:01172] Signal: Aborted (6)
> [titan01:01172] Signal code:  (-6)
> [titan01:01173] *** Process received signal ***
> [titan01:01173] Signal: Aborted (6)
> [titan01:01173] Signal code:  (-6)
> [titan01:01172] [ 0] /usr/lib64/libpthread.so.0(+0xf100)[0x2b7e9596a100]
> [titan01:01172] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b7e95fc75f7]
> [titan01:01172] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b7e95fc8ce8]
> [titan01:01172] [ 3]
> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2b7e96a95ac5]
> [titan01:01172] [ 4]
> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2b7e96bf5137]
> [titan01:01172] [ 5]
> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x140)[0x2b7e96a995e0]
> [titan01:01172] [ 6] [titan01:01173] [ 0]
> /usr/lib64/libpthread.so.0(+0xf100)[0x2af694ded100]
> [titan01:01173] [ 1] /usr/lib64/libc.so.6(+0x35670)[0x2b7e95fc7670]
> [titan01:01172] [ 7] [0x2b7e9c86e3a1]
> [titan01:01172] *** End of error message ***
> /usr/lib64/libc.so.6(gsignal+0x37)[0x2af69544a5f7]
> [titan01:01173] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2af69544bce8]
> [titan01:01173] [ 3]
> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x742ac5)[0x2af695f18ac5]
> [titan01:01173] [ 4]
> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(+0x8a2137)[0x2af696078137]
> [titan01:01173] [ 5]
> /home/gl069/bin/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x140)[0x2af695f1c5e0]
> [titan01:01173] [ 6] /usr/lib64/libc.so.6(+0x35670)[0x2af69544a670]
> [titan01:01173] [ 7] [0x2af69c0693a1]
> [titan01:01173] *** End of error message ***
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 0 on node titan01 exited on
> signal 6 (Aborted).
>
>
> ########CONFIGURATION:
> I used the ompi master sources from github:
> commit 267821f0dd405b5f4370017a287d9a49f92e734a
> Author: Gilles Gouaillardet <gil...@rist.or.jp>
> Date:   Tue Jul 5 13:47:50 2016 +0900
>
> ./configure --enable-mpi-java --with-jdk-dir=/home/gl069/bin/jdk1.7.0_25
> --disable-dlopen --disable-mca-dso
>
> Thanks a lot for your help!
> Gundram
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/07/29584.php
>

Reply via email to