On Aug 18, 2009, at 11:36 AM, Jean Potsam wrote:

Dear ALL,
I am trying to checkpoint MPI application using the self component. I had a look at the OPEN MPI FT user's guide Draft 1.4. but is still unsure.

I have installed openmpi as follows:

jean$ ./configure --prefix=/home/jean/openmpi/ --enable-debug -- enable-mpi-profile --enable-mpi-cxx --enable-binaries --enable- trace --enable-static=yes --enable-debug --with-devel-headers=1 -- with-mpi-param-check=always --with-ft=cr --enable-ft-thread -- enable-mpi-threads=yes

jean$ make all install

MY questions are:

Q1) Have I properly configured openmpi with self?

Yes it looks like you have configured correctly. To double check you can look at the config.log file in the build directory, and look for the following lines (it should say 'yes'):
----------------
configure:87103: checking if MCA component crs:self can compile
configure:87105: result: yes
----------------

I recently fixed a number of bugs with the 'self' CRS functionality. So you will want to make sure you are using a recent version of either the development trunk (anything after r21777) or the v1.3 branch (anything after r21798).


In the document, it is said:
"To be absolutely clear: these functions are to be provided by the application - they are not included in the open mpi library"

q2) Does this means that i will have to write my own checkpoint, continue and restart functions and fucntion calls?

The 'self' checkpointer requires the application to write its own checkpoint, continue, and restart functions. These functions must have a precise signature since they are called by Open MPI. In particular they need to look like:
  int opal_crs_self_user_checkpoint(char **restart_cmd);
  int opal_crs_self_user_continue(void);
  int opal_crs_self_user_restart(void);

The 'crs_self_prefix' MCA parameter will allow you to customize the function names a bit. For example: shell$ mpirun -np 2 -am ft-enable-cr -mca crs_self_prefix my_personal my-app

Will cause Open MPI to look for functions with the following signature:
  int my_personal_checkpoint(char **restart_cmd);
  int my_personal_continue(void);
  int my_personal_restart(void);



Q3) has anyone experienced with self checkpointing? I would really appreaciate if a guide could be available.

The C/R FT User's Guide is the only guide that I know of out there. I attached a sample program that takes advantage of the 'self' CRS system.

To compile:
  mpicc personal-cr.c -export -export-dynamic -o personal-cr

To run with default function names:
  shell$ mpirun -np 2 -am ft-enable-cr personal-cr

To run with custom function names:
shell$ mpirun -np 2 -am ft-enable-cr -mca crs_self_prefix my_personal personal-cr


-- Josh

Attachment: personal-cr.c
Description: Binary data



Thanks a lot

cheers

JEan

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to