[OMPI users] libmpi_f90.so.0 problem

2010-04-14 Thread max marconi
I have just installed openmpi on my system and tried to run the example Hello_f90. The following error was generated upon executing. : error while loading shared libraries: libmpi_f90.so.0: cannot open shared object file: No such file or directory The library with libmpi_f90 is located i

Re: [OMPI users] OpenMPI Checkpoint/Restart is failed

2010-04-14 Thread Fernando Lemos
On Wed, Apr 14, 2010 at 5:25 AM, Hideyuki Jitsumoto wrote: > Fernando, > > Thank you for your reply. > I tried to patch the file you mentioned, but the output did not change. I didn't test the patch, tbh. I'm using 1.5 nightly snapshots, and it works great. >>Are you using a shared file system?

Re: [OMPI users] Don't crash on node failures

2010-04-14 Thread Ralph Castain
Yes - followed a few microseconds later with a SIGKILL if it didn't terminate. The daemon exits shortly thereafter, and if the proc is -still- somehow alive, it kills itself once it sees the daemon is gone. On Apr 14, 2010, at 7:29 AM, Jürgen Kaiser wrote: > What happens exactly when a job or

Re: [OMPI users] Don't crash on node failures

2010-04-14 Thread Jürgen Kaiser
What happens exactly when a job or node crashes? Does orte send a SIGTERM to each process? Best regards, Jürgen Durga Choudhury wrote: > This would be a very welcoming new feature for me as well. My two > thumbs up when it happens. > > Best regards > Durga > > > On Tue, Apr 13, 2010 at 10:28 AM,

Re: [OMPI users] OpenMPI Checkpoint/Restart is failed

2010-04-14 Thread Hideyuki Jitsumoto
Fernando, Thank you for your reply. I tried to patch the file you mentioned, but the output did not change. >Are you using a shared file system? You need to use a shared file system for checkpointing with 1.4.1: What is the shared file system ? do you mean NFS, Lustre and so on ? (I'm sorry about