Hello I've just introduced the possibility to use OpenMPI instead of MPICH in an ocean model. The code is quite well tested and has being run in various parallel setups by various groups.
I've compiled the program using mpif90 (instead of ifort). When I run I get the error - shown at the end of this mail. As you can see all 13 jobs are started - but then ... One problem with ocean models using domain decomposition in relation to load balancing is that the computational burden of the equal sized domain is not the same (the different domains have different land-fractions). To overcome this a matlab tool has been developed that allows for assigning more sub-doamins to one processor/core based on the sum of water-points in the sub-domains. Attached is a figure showing the actual setup in this case. The neighbor relation is read from a file produced by said matlab-tool. Non-existing neighbors are set to -1 - MPI_PROC_NULL in MPICH. The setup is run on a quad-core machine for testing purposes only. Any ideas what goes wrong? ==== error ====== kb@gate:~/DK/setups/north_sea_fine$ mpirun -np 13 bin/getm_prod_IFORT.96x96 Process 0 of 13 is alive on gate [gate:18564] *** An error occurred in MPI_Isend [gate:18564] *** on communicator MPI_COMM_WORLD [gate:18564] *** MPI_ERR_RANK: invalid rank [gate:18564] *** MPI_ERRORS_ARE_FATAL (goodbye) Process 1 of 13 is alive on gate [gate:18565] *** An error occurred in MPI_Isend [gate:18565] *** on communicator MPI_COMM_WORLD [gate:18565] *** MPI_ERR_RANK: invalid rank [gate:18565] *** MPI_ERRORS_ARE_FATAL (goodbye) Process 2 of 13 is alive on gate Process 3 of 13 is alive on gate [gate:18567] *** An error occurred in MPI_Isend [gate:18567] *** on communicator MPI_COMM_WORLD [gate:18567] *** MPI_ERR_RANK: invalid rank [gate:18567] *** MPI_ERRORS_ARE_FATAL (goodbye) Process 4 of 13 is alive on gate [gate:18568] *** An error occurred in MPI_Isend [gate:18568] *** on communicator MPI_COMM_WORLD [gate:18568] *** MPI_ERR_RANK: invalid rank [gate:18568] *** MPI_ERRORS_ARE_FATAL (goodbye) Process 5 of 13 is alive on gate [gate:18569] *** An error occurred in MPI_Isend [gate:18569] *** on communicator MPI_COMM_WORLD [gate:18569] *** MPI_ERR_RANK: invalid rank [gate:18569] *** MPI_ERRORS_ARE_FATAL (goodbye) Process 7 of 13 is alive on gate [gate:18571] *** An error occurred in MPI_Isend [gate:18571] *** on communicator MPI_COMM_WORLD [gate:18571] *** MPI_ERR_RANK: invalid rank [gate:18571] *** MPI_ERRORS_ARE_FATAL (goodbye) Process 8 of 13 is alive on gate Process 9 of 13 is alive on gate [gate:18573] *** An error occurred in MPI_Isend [gate:18573] *** on communicator MPI_COMM_WORLD [gate:18573] *** MPI_ERR_RANK: invalid rank [gate:18573] *** MPI_ERRORS_ARE_FATAL (goodbye) Process 10 of 13 is alive on gate [gate:18574] *** An error occurred in MPI_Isend [gate:18574] *** on communicator MPI_COMM_WORLD [gate:18574] *** MPI_ERR_RANK: invalid rank [gate:18574] *** MPI_ERRORS_ARE_FATAL (goodbye) Process 11 of 13 is alive on gate Process 12 of 13 is alive on gate [gate:18576] *** An error occurred in MPI_Isend [gate:18576] *** on communicator MPI_COMM_WORLD [gate:18576] *** MPI_ERR_RANK: invalid rank [gate:18576] *** MPI_ERRORS_ARE_FATAL (goodbye) [gate:18566] *** An error occurred in MPI_Isend [gate:18566] *** on communicator MPI_COMM_WORLD [gate:18566] *** MPI_ERR_RANK: invalid rank [gate:18566] *** MPI_ERRORS_ARE_FATAL (goodbye) [gate:18572] *** An error occurred in MPI_Isend [gate:18572] *** on communicator MPI_COMM_WORLD [gate:18572] *** MPI_ERR_RANK: invalid rank [gate:18572] *** MPI_ERRORS_ARE_FATAL (goodbye) [gate:18575] *** An error occurred in MPI_Isend [gate:18575] *** on communicator MPI_COMM_WORLD [gate:18575] *** MPI_ERR_RANK: invalid rank [gate:18575] *** MPI_ERRORS_ARE_FATAL (goodbye) Process 6 of 13 is alive on gate [gate:18570] *** An error occurred in MPI_Isend [gate:18570] *** on communicator MPI_COMM_WORLD [gate:18570] *** MPI_ERR_RANK: invalid rank [gate:18570] *** MPI_ERRORS_ARE_FATAL (goodbye) [gate:18561] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275 [gate:18561] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1166 -- ---------------------------------------------------------------------- Karsten Bolding Bolding & Burchard Hydrodynamics Strandgyden 25 Phone: +45 64422058 DK-5466 Asperup Fax: +45 64422068 Denmark Email: kars...@bolding-burchard.com http://www.findvej.dk/Strandgyden25,5466,11,3 ----------------------------------------------------------------------