On 01/10/2015 10:24, Emyr James wrote:
"ORTE has lost communication with its daemon located on node:
hostname: node123
This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failu
Hi,
I am using openmpi with Platform LSF on our cluster that has 10Gbe
connectivity.
Sometimes things work fine but we get a lot of occurences of mpi jobs
not getting off the ground and the following appears in the log...
"ORTE has lost communication with its daemon located on node:
hostna