Hi , All: I running a Open MPI (1.3.4) program by 200 parallel processes. But, the program is terminated with --------------------------------------------------------------------------mpirun noticed that process rank 0 with PID 77967 on node n342 exited on signal 9 (Killed).-------------------------------------------------------------------------- After searching, the signal 9 means: the process is currently in an unworkable state and should be terminated with extreme prejudice If a process does not respond to any other termination signals, sending it a SIGKILL signal will almost always cause it to go away. The system will generate SIGKILL for a process itself under some unusual conditions where the program cannot possibly continue to run (even to run a signal handler). But, the error message does not indicate any possible reasons for the termination. There is a FOR loop in the main() program, if the loop number is small (< 200), the program works well, but if it becomes lager and larger, the program will got SIGKILL. The cluster where I am running the MPI program does not allow running debug tools. If I run it on a workstation, it will take a very very long time (for > 200 loops) in order to get the error occur again. What can I do to find the possible bugs ? Any help is really appreciated. thanks Jack
