This typically this means that one or more of the rcp/scp or rsh/ssh commands failed. FileM should be printing an error message when one of the copy commands fail. Try turning up the verbose level to 10 to see if it indicates any problems:
 -mca filem_rsh_verbose 10

Can you send me the MCA parameters that you are setting? That may help narrow down the problem as well. Also I cleaned up some of the filem (and snapc) error reporting in the development trunk if you want to give that a try.

Let me know what you find out.

Best,
Josh

On Apr 30, 2009, at 6:40 AM, Bouguerra mohamed slim wrote:

Hello,
I have a problem with the Filem module when i would checkpoint on a remote host without shared space file system. I use the new open-mpi 1.3.2 and it is the same problem as in the version 1.3.1. Indeed, when i use the NFS system file it works. Thus i guess that is a problem with the Filem.

[azur-6.fr:23223] filem:rsh: wait_all(): Wait failed (-1)
[azur-6.fr:23223] [[48784,0],0] ORTE_ERROR_LOG: Error in file /home/ grenoble/msbouguerra/openmpi-1.3.2/orte/mca/snapc/full/ snapc_full_global.c at line 1054

--
Cordialement,
Mohamed-Slim BOUGUERRA    PhD student INRIA-Grenoble / Projet MOAIS
ENSIMAG - antenne de Montbonnot
ZIRST 51, avenue Jean Kuntzmann
38330 MONTBONNOT SAINT MARTIN France
Tel :+33 (0)4 76 61 20 79
Fax :+33 (0)4 76 61 20 99

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to