Did you try checkpointing a non-MPI application with BLCR on the
cluster? If that does not work then I would suspect that BLCR is not
working properly on the system.
However if a non-MPI application can be checkpointed and restarted
correctly on this machine then it may be something odd with the Open
MPI installation or runtime environment. To help debug here I would
need to know how Open MPI was configured and how the application was
ran on the machine (command line arguments, environment variables, ...).
I should note that for the program that you sent it is important that
you compile Open MPI with the Fault Tolerance Thread enabled to ensure
a timely checkpoint. Otherwise the checkpoint will be delayed until
the MPI program enters the MPI_Finalize function.
Let me know what you find out.
Josh
On Jun 16, 2009, at 5:08 PM, Kritiraj Sajadah wrote:
Hi Josh,
Thanks for the email. I have install BLCR 0.8.1 and openmpi 1.3 on
my laptop with Ubuntu 8.04 on it. It works fine.
I now tried the installation on the cluster ( on one machine for
now) in my university. ( the administrator installed it) i am not
sure if he followed the steps i gave him.
I am checkpointing a simple mpi application which looks as follows:
#include <mpi.h>
#include <stdio.h>
int main(int argc, char **argv)
{
int rank,size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("I am processor no %d of a total of %d procs \n", rank, size);
system("sleep 30");
printf("I am processor no %d of a total of %d procs \n", rank, size);
system("sleep 30");
printf("I am processor no %d of a total of %d procs \n", rank, size);
system("sleep 30");
printf("bye \n");
MPI_Finalize();
return 0;
}
Do you think its better to re install BLCR?
Thanks
Raj
--- On Tue, 6/16/09, Josh Hursey <jjhur...@open-mpi.org> wrote:
From: Josh Hursey <jjhur...@open-mpi.org>
Subject: Re: [OMPI users] vfs_write returned -14
To: "Open MPI Users" <us...@open-mpi.org>
Date: Tuesday, June 16, 2009, 6:42 PM
These are errors from BLCR. It may be a problem with your
BLCR installation and/or your application. Are you able to
checkpoint/restart a non-MPI application with BLCR on these
machines?
What kind of MPI application are you trying to checkpoint?
Some of the MPI interfaces are not fully supported at the
moment (outlined in the FT User Document that I mentioned in
a previous email).
-- Josh
On Jun 16, 2009, at 11:30 AM, Kritiraj Sajadah wrote:
Dear All,
I have install
openmpi 1.3 and blcr 0.8.1 on a linux machine (ubuntu).
however, when i try checkpointing an MPI application, I get
the following error:
- vfs_write returned -14
- file_header: write returned -14
Can someone help please.
Regards,
Raj
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users