I built the Nasa Overflow 1.8ab code yesterday with openmpi-1.0a1r7632. It runs fine with 4 or 8 opteron processors on a myrinet linux cluster. But if I increase the number of processors to 20, I get errors like this :
[e053:01260] *** An error occurred in MPI_Free_mem [e030:15585] *** An error occurred in MPI_Free_mem [e013:27621] *** An error occurred in MPI_Free_mem [e030:15585] *** on communicator MPI_COMM_WORLD [e032:14179] *** An error occurred in MPI_Free_mem [e053:01260] *** on communicator MPI_COMM_WORLD [e030:15585] *** MPI_ERR_NO_MEM: out of memory [e053:01260] *** MPI_ERR_NO_MEM: out of memory [e013:27621] *** on communicator MPI_COMM_WORLD [e030:15585] *** MPI_ERRORS_ARE_FATAL (goodbye) [e032:14179] *** on communicator MPI_COMM_WORLD [e053:01260] *** MPI_ERRORS_ARE_FATAL (goodbye) [e013:27621] *** MPI_ERR_NO_MEM: out of memory [e012:30846] *** An error occurred in MPI_Free_mem [e012:30846] *** on communicator MPI_COMM_WORLD [e012:30846] *** MPI_ERR_NO_MEM: out of memory [e012:30846] *** MPI_ERRORS_ARE_FATAL (goodbye) [e032:14179] *** MPI_ERR_NO_MEM: out of memory [e013:27621] *** MPI_ERRORS_ARE_FATAL (goodbye) [e032:14179] *** MPI_ERRORS_ARE_FATAL (goodbye) [e032:14178] *** An error occurred in MPI_Free_mem [e032:14178] *** on communicator MPI_COMM_WORLD [e032:14178] *** MPI_ERR_NO_MEM: out of memory [e032:14178] *** MPI_ERRORS_ARE_FATAL (goodbye) DIMENSIONS FOR COARSE LEVEL(S), GRID 1: [e011:12272] spawn: in job_state_callback(jobid = 1, state = 0xa) [e011:12272] spawn: in job_state_callback(jobid = 1, state = 0x9) 20 processes killed (possibly by Open MPI) [e011:12272] sess_dir_finalize: found proc session dir empty - deleting [e011:12272] sess_dir_finalize: job session dir not empty - leaving I am running using PBSPro and the Intel 9 compiler. Any ideas on what I could be doing wrong?? The size of my test problem is very small. Thanx, Bernie Borenstein The Boeing Company