On Fri, Mar 5, 2010 at 12:03 PM, Josh Hursey wrote:
> This type of failure is usually due to prelink'ing being left enabled on one
> or more of the systems. This has come up multiple times on the Open MPI
> list, but is actually a problem between BLCR and the Linux kernel. BLCR has
> a FAQ entry o
This type of failure is usually due to prelink'ing being left enabled
on one or more of the systems. This has come up multiple times on the
Open MPI list, but is actually a problem between BLCR and the Linux
kernel. BLCR has a FAQ entry on this that you will want to check out:
https://upc-
2010-03-05
马少杰
Dear Sir:
I want to use openmpi and blcr to checkpoint.However, I want restart the
check point
on other hosts. For example, I run mpi program using openmpi on
host1 and host2, and I save the checkpoint file at a nfs shared path.
Then I wan to restart the job (ompi-res