Hi Siegmar,

I will review PR 1698 and wait some more feedback from the developers, they
might have different views than mine.
assuming PR 1698 does what you expect, it does not catch all user errors.
for example, if you MPI_Send a buffer that is too short, the exception
might be thrown at any time.
in the worst case, it will occur in the progress thread and outside of any
MPI call, which means it cannot be "converted" into a MPIException.

fwiw, we have a way to check buffers, but it requires
1. Open MPI is configure'd with --enable-memchecker
and
2. the MPI tasks are ran under valgrind.
iirc, valgrind will issue an error message if the buffer is invalid, and
the app will crash after
(e.g. the MPI subroutine will not return with an error code the end user
can "trap")

such checks might be easier to make in Java, and resulting errors might be
easily made "trappable", but as far as I am concerned
1. this has a runtime overhead
2. this is a new development.

let's follow up at https://github.com/open-mpi/ompi/issues/1698 from now

Cheers,

Gilles

On Monday, August 29, 2016, Siegmar Gross <
siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi Gilles,
>
> isn't it possible to pass all exceptions from the Java interface
> to the calling method? I can live with the current handling of
> exceptions as well, although some exceptions can be handled
> within my program and some will break my program even if I want
> to handle exceptions myself. I understood PR 1698 in the way, that
> all exceptions can be processed in the user program if the user
> chooses MPI.ERRORS_RETURN (otherwise this change request wouldn't
> have been necessary). Nevertheless, if you decide, things are as
> they are, I'm happy with your decision as well.
>
>
> Kind regards
>
> Siegmar
>
>
> Am 29.08.2016 um 10:30 schrieb Gilles Gouaillardet:
>
>> Siegmar and all,
>>
>>
>> i am puzzled with this error.
>>
>> on one hand, it is caused by an invalid buffer
>>
>> (e.g. buffer size is 1, but user suggests size is 2)
>>
>> so i am fine with current behavior (e.g.
>> java.lang.ArrayIndexOutOfBoundsException is thrown)
>>
>> /* if that was a C program, it would very likely SIGSEGV, e.g. Open MPI
>> does
>> not catch this kind of error when checking params */
>>
>>
>> on the other hand, Open MPI could be enhanced to check the buffer size,
>> and
>> throw a MPIException in this case.
>>
>>
>> as far as i am concerned, this is a feature request and not a bug.
>>
>>
>> thoughts anyone ?
>>
>>
>> Cheers,
>>
>>
>> Gilles
>>
>> On 8/29/2016 3:48 PM, Siegmar Gross wrote:
>>
>>> Hi,
>>>
>>> I have installed v1.10.3-31-g35ba6a1, openmpi-v2.0.0-233-gb5f0a4f,
>>> and openmpi-dev-4691-g277c319 on my "SUSE Linux Enterprise Server
>>> 12 (x86_64)" with Sun C 5.14 beta and gcc-6.1.0. In May I had
>>> reported a problem with Java execeptions (PR 1698) which had
>>> been solved in June (PR 1803).
>>>
>>> https://github.com/open-mpi/ompi/issues/1698
>>> https://github.com/open-mpi/ompi/pull/1803
>>>
>>> Unfortunately the problem still exists or exists once more
>>> in all three branches.
>>>
>>>
>>> loki fd1026 112 ompi_info | grep -e "Open MPI repo revision" -e "C
>>> compiler
>>> absolute"
>>>   Open MPI repo revision: dev-4691-g277c319
>>>      C compiler absolute: /opt/solstudio12.5b/bin/cc
>>> loki fd1026 112 mpijavac Exception_2_Main.java
>>> warning: [path] bad path element
>>> "/usr/local/openmpi-master_64_cc/lib64/shmem.jar": no such file or
>>> directory
>>> 1 warning
>>> loki fd1026 113 mpiexec -np 1 java Exception_2_Main
>>> Set error handler for MPI.COMM_WORLD to MPI.ERRORS_RETURN.
>>> Call "bcast" with index out-of bounds.
>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
>>>         at mpi.Comm.bcast(Native Method)
>>>         at mpi.Comm.bcast(Comm.java:1252)
>>>         at Exception_2_Main.main(Exception_2_Main.java:22)
>>> -------------------------------------------------------
>>> Primary job  terminated normally, but 1 process returned
>>> a non-zero exit code. Per user-direction, the job has been aborted.
>>> -------------------------------------------------------
>>> ------------------------------------------------------------
>>> --------------
>>> mpiexec detected that one or more processes exited with non-zero status,
>>> thus causing
>>> the job to be terminated. The first process to do so was:
>>>
>>>   Process name: [[58548,1],0]
>>>   Exit code:    1
>>> ------------------------------------------------------------
>>> --------------
>>> loki fd1026 114 exit
>>>
>>>
>>>
>>> loki fd1026 116 ompi_info | grep -e "Open MPI repo revision" -e "C
>>> compiler
>>> absolute"
>>>   Open MPI repo revision: v2.0.0-233-gb5f0a4f
>>>      C compiler absolute: /opt/solstudio12.5b/bin/cc
>>> loki fd1026 117 mpijavac Exception_2_Main.java
>>> warning: [path] bad path element
>>> "/usr/local/openmpi-2.0.1_64_cc/lib64/shmem.jar": no such file or
>>> directory
>>> 1 warning
>>> loki fd1026 118 mpiexec -np 1 java Exception_2_Main
>>> Set error handler for MPI.COMM_WORLD to MPI.ERRORS_RETURN.
>>> Call "bcast" with index out-of bounds.
>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
>>>         at mpi.Comm.bcast(Native Method)
>>>         at mpi.Comm.bcast(Comm.java:1252)
>>>         at Exception_2_Main.main(Exception_2_Main.java:22)
>>> -------------------------------------------------------
>>> Primary job  terminated normally, but 1 process returned
>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>> -------------------------------------------------------
>>> ------------------------------------------------------------
>>> --------------
>>> mpiexec detected that one or more processes exited with non-zero status,
>>> thus causing
>>> the job to be terminated. The first process to do so was:
>>>
>>>   Process name: [[58485,1],0]
>>>   Exit code:    1
>>> ------------------------------------------------------------
>>> --------------
>>> loki fd1026 119 exit
>>>
>>>
>>>
>>> loki fd1026 107 ompi_info | grep -e "Open MPI repo revision" -e "C
>>> compiler
>>> absolute"
>>>   Open MPI repo revision: v1.10.3-31-g35ba6a1
>>>      C compiler absolute: /opt/solstudio12.5b/bin/cc
>>> loki fd1026 107 mpijavac Exception_2_Main.java
>>> loki fd1026 108 mpiexec -np 1 java Exception_2_Main
>>> Set error handler for MPI.COMM_WORLD to MPI.ERRORS_RETURN.
>>> Call "bcast" with index out-of bounds.
>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
>>>         at mpi.Comm.bcast(Native Method)
>>>         at mpi.Comm.bcast(Comm.java:1231)
>>>         at Exception_2_Main.main(Exception_2_Main.java:22)
>>> -------------------------------------------------------
>>> Primary job  terminated normally, but 1 process returned
>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>> -------------------------------------------------------
>>> ------------------------------------------------------------
>>> --------------
>>> mpiexec detected that one or more processes exited with non-zero status,
>>> thus causing
>>> the job to be terminated. The first process to do so was:
>>>
>>>   Process name: [[34400,1],0]
>>>   Exit code:    1
>>> ------------------------------------------------------------
>>> --------------
>>> loki fd1026 109 exit
>>>
>>>
>>>
>>>
>>> I would be grateful, if somebody can fix the problem. Thank you
>>> very much for any help in advance.
>>>
>>>
>>> Kind regards
>>>
>>> Siegmar
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to