Dear all,
I have figured it out. It was a simple issue, I didn't add the "blcr lib" to
the $PATH environment varable. However, it can make checkpoint operation,
but can't make restart operation successfully. It was so wield.
Best regards
Xianjun Meng
在 2010年12月23日 下午5:35,孟宪军 写道:
> My main ques
I'm not sure there is any documentation yet - not much clamor for it. :-/
It would really help if you included the error message. Otherwise, all I can do
is guess, which wastes both of our time :-(
My best guess is that the port reservation didn't get passed down to the MPI
procs properly - but
Can anyone point me towards the most recent documentation for using
srun and openmpi?
I followed what i found on the web with enabling the MpiPorts config
in slurm and using the --resv-ports switch, but I'm getting an error
from openmpi during setup.
I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM
On Fri, Dec 17, 2010 at 5:43 PM, Sashi Balasingam wrote:
> Hi,
> I recently started on an MPI-based, 'real-time', pipelined-processing
> application, and the application fails due to large time-jitter in sending
> and receiving messages. Here are related info -
>
> 1) Platform:
> a) Intel Box: Two
My main question is:
after I finished the checkpoint operation against a simple task which ran on
tow machines, I can only restart it on one machine. if I ran the following
command to force the ompi-restart to run the program on two machines:
*ompi-restart -hostfile ./machine_names ompi_global
Dear all,
I had to try the checkpoint/restart function of Openmpi recently, and after
several failure and checking lots of the docement, I am still very confused
about how to config the checkpoint/restart function. Can anybody give me a
$HOME/.openmpi/mca-params.conf script and introduce me what p