Hi

today I implemented a small Java program to multiply two matrices.
Once more the program works only well, if you simulate a 2-dimensional
array in an 1-dimensional one. The program works on Solaris 10 Sparc
and x86_64. It breaks on Linux x86_64 (openSuSE 12.1). Furthermore it
breaks if I combine little-endian and big-endian machines.

mpiexec -np 4 -host tyr java MatMultWithNproc2DarrayIn1DarrayMain    or
mpiexec -np 4 -host sunpc1 java MatMultWithNproc2DarrayIn1DarrayMain

Process 0 of 4 running on tyr.informatik.hs-fulda.de.
Process 1 of 4 running on tyr.informatik.hs-fulda.de.
Process 2 of 4 running on tyr.informatik.hs-fulda.de.
Process 3 of 4 running on tyr.informatik.hs-fulda.de.

(4,6)-matrix a:

      1.00      2.00      3.00      4.00      5.00      6.00
      7.00      8.00      9.00     10.00     11.00     12.00
     13.00     14.00     15.00     16.00     17.00     18.00
     19.00     20.00     21.00     22.00     23.00     24.00

(6,8)-matrix b:

     48.00     47.00     46.00     45.00     44.00     43.00     42.00     41.00
     40.00     39.00     38.00     37.00     36.00     35.00     34.00     33.00
     32.00     31.00     30.00     29.00     28.00     27.00     26.00     25.00
     24.00     23.00     22.00     21.00     20.00     19.00     18.00     17.00
     16.00     15.00     14.00     13.00     12.00     11.00     10.00      9.00
      8.00      7.00      6.00      5.00      4.00      3.00      2.00      1.00

(4,8)-result-matrix c = a * b:

    448.00    427.00    406.00    385.00    364.00    343.00    322.00    301.00
   1456.00   1399.00   1342.00   1285.00   1228.00   1171.00   1114.00   1057.00
   2464.00   2371.00   2278.00   2185.00   2092.00   1999.00   1906.00   1813.00
   3472.00   3343.00   3214.00   3085.00   2956.00   2827.00   2698.00   2569.00





mpiexec -np 4 -host linpc1 java MatMultWithNproc2DarrayIn1DarrayMain
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  mca_base_open failed
  --> Returned value -2 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_init failed
  --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
...
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[(null):29256] Local abort before MPI_INIT completed successfully; not able to 
aggregate erro
r messages, and not able to guarantee that all other processes were killed!
-------------------------------------------------------
...





mpiexec -np 4 -host tyr,sunpc1 java MatMultWithNproc2DarrayIn1DarrayMain
[tyr:20374] *** An error occurred in MPI_Comm_dup
[tyr:20374] *** reported by process [3921084417,0]
[tyr:20374] *** on communicator MPI_COMM_WORLD
[tyr:20374] *** MPI_ERR_INTERN: internal error
[tyr:20374] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
abort,
[tyr:20374] ***    and potentially your MPI job)
[tyr.informatik.hs-fulda.de:20369] 1 more process has sent help message 
help-mpi-errors.txt / mpi_errors_are_fatal
[tyr.informatik.hs-fulda.de:20369] Set MCA parameter "orte_base_help_aggregate" 
to 0 to see all help / error messages
tyr java 270 


Any ideas why it breaks? Thank you very much for your help in advance.


Kind regards

Siegmar

Attachment: MatMultWithNproc2DarrayMain.java
Description: MatMultWithNproc2DarrayMain.java

Attachment: MatMultWithNproc2DarrayIn1DarrayMain.java
Description: MatMultWithNproc2DarrayIn1DarrayMain.java

Attachment: PrintArray.java
Description: PrintArray.java

Reply via email to