Hi,
I've got some code that uses openmpi, and sometimes, it crashes, after printing somthing like:

[mac1:09654] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275 [mac1:09654] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1166 [mac1:09654] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90 mpirun noticed that job rank 1 with PID 9658 on node mac1 exited on signal 6 (Aborted).
2 additional processes aborted (not shown)
[mac1:09654] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188 [mac1:09654] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1198
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.
--------------------------------------------------------------------------

In this case, all processes were running on the same machine, so its not a connection problem. Is this a bug, or something else wrong? Is there a way to increase the timeout time?

Thanks...

Reply via email to