Hi, today a tested some small Java programs with openmpi-dev-178-ga16c1e4. One program throws an exception ArrayIndexOutOfBoundsException. The program worked fine in older MPI versions, e.g., openmpi-1.8.2a1r31804.
tyr java 138 mpiexec -np 2 java MsgSendRecvMain
Now 1 process sends its greetings.
Greetings from process 1:
message tag: 3
message length: 26
message:
tyr.informatik.hs-fulda.de???????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at mpi.Comm.recv(Native Method)
at mpi.Comm.recv(Comm.java:391)
at MsgSendRecvMain.main(MsgSendRecvMain.java:92)
...
The exception happens also on my Linux box.
linpc1 java 102 mpijavac MsgSendRecvMain.java
linpc1 java 103 mpiexec -np 2 java MsgSendRecvMain
Now 1 process sends its greetings.
Greetings from process 1:
message tag: 3
message length: 6
message: linpc1?????%???%?????%?f?%?%???$??????????
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at mpi.Comm.recv(Native Method)
at mpi.Comm.recv(Comm.java:391)
at MsgSendRecvMain.main(MsgSendRecvMain.java:92)
...
tyr java 139 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
...
(gdb) run -np 2 java MsgSendRecvMain
Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 2 java
MsgSendRecvMain
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP 2 ]
Now 1 process sends its greetings.
Greetings from process 1:
message tag: 3
message length: 26
message: tyr.informatik.hs-fulda.de
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at mpi.Comm.recv(Native Method)
at mpi.Comm.recv(Comm.java:391)
at MsgSendRecvMain.main(MsgSendRecvMain.java:92)
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus
causing
the job to be terminated. The first process to do so was:
Process name: [[61564,1],1]
Exit code: 1
--------------------------------------------------------------------------
[LWP 2 exited]
[New Thread 2 ]
[Switching to Thread 1 (LWP 1)]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy
query
(gdb) bt
#0 0xffffffff7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1
#1 0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
#2 0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
#3 0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
#4 0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
#5 0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
#6 0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
#7 0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
#8 0xffffffff7ec87ca0 in vm_close ()
from /usr/local/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0
#9 0xffffffff7ec85274 in lt_dlclose ()
from /usr/local/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0
#10 0xffffffff7ecaa5dc in ri_destructor (obj=0x100187b70)
at
../../../../openmpi-dev-178-ga16c1e4/opal/mca/base/mca_base_component_repository.c:382
#11 0xffffffff7eca8fd8 in opal_obj_run_destructors (object=0x100187b70)
at ../../../../openmpi-dev-178-ga16c1e4/opal/class/opal_object.h:446
#12 0xffffffff7eca9eac in mca_base_component_repository_release (
component=0xffffffff7b1236f0 <mca_oob_tcp_component>)
at
../../../../openmpi-dev-178-ga16c1e4/opal/mca/base/mca_base_component_repository.c:240
#13 0xffffffff7ecac17c in mca_base_component_unload (
component=0xffffffff7b1236f0 <mca_oob_tcp_component>, output_id=-1)
at
../../../../openmpi-dev-178-ga16c1e4/opal/mca/base/mca_base_components_close.c:47
#14 0xffffffff7ecac210 in mca_base_component_close (
component=0xffffffff7b1236f0 <mca_oob_tcp_component>, output_id=-1)
at
../../../../openmpi-dev-178-ga16c1e4/opal/mca/base/mca_base_components_close.c:60
#15 0xffffffff7ecac2e4 in mca_base_components_close (output_id=-1,
components=0xffffffff7f14bc58 <orte_oob_base_framework+80>, skip=0x0)
at
../../../../openmpi-dev-178-ga16c1e4/opal/mca/base/mca_base_components_close.c:86
#16 0xffffffff7ecac24c in mca_base_framework_components_close (
framework=0xffffffff7f14bc08 <orte_oob_base_framework>, skip=0x0)
at
../../../../openmpi-dev-178-ga16c1e4/opal/mca/base/mca_base_components_close.c:66
#17 0xffffffff7efcaf80 in orte_oob_base_close ()
at
../../../../openmpi-dev-178-ga16c1e4/orte/mca/oob/base/oob_base_frame.c:112
#18 0xffffffff7ecc0d74 in mca_base_framework_close (
framework=0xffffffff7f14bc08 <orte_oob_base_framework>)
at
../../../../openmpi-dev-178-ga16c1e4/opal/mca/base/mca_base_framework.c:187
#19 0xffffffff7be07858 in rte_finalize ()
at
../../../../../openmpi-dev-178-ga16c1e4/orte/mca/ess/hnp/ess_hnp_module.c:857
#20 0xffffffff7ef338bc in orte_finalize ()
at ../../openmpi-dev-178-ga16c1e4/orte/runtime/orte_finalize.c:66
#21 0x000000010000723c in orterun (argc=5, argv=0xffffffff7fffe0d8)
at ../../../../openmpi-dev-178-ga16c1e4/orte/tools/orterun/orterun.c:1103
#22 0x0000000100003e80 in main (argc=5, argv=0xffffffff7fffe0d8)
at ../../../../openmpi-dev-178-ga16c1e4/orte/tools/orterun/main.c:13
(gdb)
Hopefully the problem has nothing to do with my program.
I would be grateful if somebody (Oscar?) can fix the
problem. Thank you very much for any help in advance.
Kind regards
Siegmar
MsgSendRecvMain.java
Description: MsgSendRecvMain.java
