Hi today I implemented a small Java program to multiply two matrices. Once more the program works only well, if you simulate a 2-dimensional array in an 1-dimensional one. The program works on Solaris 10 Sparc and x86_64. It breaks on Linux x86_64 (openSuSE 12.1). Furthermore it breaks if I combine little-endian and big-endian machines.
mpiexec -np 4 -host tyr java MatMultWithNproc2DarrayIn1DarrayMain or mpiexec -np 4 -host sunpc1 java MatMultWithNproc2DarrayIn1DarrayMain Process 0 of 4 running on tyr.informatik.hs-fulda.de. Process 1 of 4 running on tyr.informatik.hs-fulda.de. Process 2 of 4 running on tyr.informatik.hs-fulda.de. Process 3 of 4 running on tyr.informatik.hs-fulda.de. (4,6)-matrix a: 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 19.00 20.00 21.00 22.00 23.00 24.00 (6,8)-matrix b: 48.00 47.00 46.00 45.00 44.00 43.00 42.00 41.00 40.00 39.00 38.00 37.00 36.00 35.00 34.00 33.00 32.00 31.00 30.00 29.00 28.00 27.00 26.00 25.00 24.00 23.00 22.00 21.00 20.00 19.00 18.00 17.00 16.00 15.00 14.00 13.00 12.00 11.00 10.00 9.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 (4,8)-result-matrix c = a * b: 448.00 427.00 406.00 385.00 364.00 343.00 322.00 301.00 1456.00 1399.00 1342.00 1285.00 1228.00 1171.00 1114.00 1057.00 2464.00 2371.00 2278.00 2185.00 2092.00 1999.00 1906.00 1813.00 3472.00 3343.00 3214.00 3085.00 2956.00 2827.00 2698.00 2569.00 mpiexec -np 4 -host linpc1 java MatMultWithNproc2DarrayIn1DarrayMain -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): mca_base_open failed --> Returned value -2 instead of OPAL_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_init failed --> Returned value Out of resource (-2) instead of ORTE_SUCCESS -------------------------------------------------------------------------- ... *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [(null):29256] Local abort before MPI_INIT completed successfully; not able to aggregate erro r messages, and not able to guarantee that all other processes were killed! ------------------------------------------------------- ... mpiexec -np 4 -host tyr,sunpc1 java MatMultWithNproc2DarrayIn1DarrayMain [tyr:20374] *** An error occurred in MPI_Comm_dup [tyr:20374] *** reported by process [3921084417,0] [tyr:20374] *** on communicator MPI_COMM_WORLD [tyr:20374] *** MPI_ERR_INTERN: internal error [tyr:20374] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [tyr:20374] *** and potentially your MPI job) [tyr.informatik.hs-fulda.de:20369] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal [tyr.informatik.hs-fulda.de:20369] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages tyr java 270 Any ideas why it breaks? Thank you very much for your help in advance. Kind regards Siegmar
MatMultWithNproc2DarrayMain.java
Description: MatMultWithNproc2DarrayMain.java
MatMultWithNproc2DarrayIn1DarrayMain.java
Description: MatMultWithNproc2DarrayIn1DarrayMain.java
PrintArray.java
Description: PrintArray.java