Two quick observations: 1. Open MPI 1.6.2 is pretty old, and it's not the last release in the 1.6.x series. If you can, update to Open MPI 1.6.5, which has lots of bug fixes over 1.6.2. Even better, upgrade to Open MPI 1.8.4 (i.e., the latest stable release), which has oodles of bug fixes and optimizations over the 1.6 series.
2. The error message notes that the job was killed by signal 1. Signal 1 (i.e., "HUP") is *usually* either the product of a bug in the application or some kind of external entity (e.g., a resource scheduler that determines that your job has run too long, so it kills it). > On Mar 11, 2015, at 1:36 AM, Saad Raza <saadr_...@yahoo.com> wrote: > > Dear all, > > I do not know whether I should ask this on openmpi forum or Amber forum but > mpirun seems to crash randomly when they are subjected to long calculation. I > have build openmpi from openmpi-1.6.2. I have used the following commands for > configuring and installation: > > ./configure --prefix=/usr/lib64/mpi/gcc/openmpi > --exec-prefix=/usr/lib64/mpi/gcc/openmpi > make all install > > Some calculations run completely fine but some of them crash randomly with > the following type of error. > > -------------------------------------------------------------------------- > mpirun noticed that process rank 3 with PID 5028 on node drsikandarserver > exited on signal 1 (Hangup). > -------------------------------------------------------------------------- > > I am using nohup before the mpirun command. The general structure of the > command is > > nohup mpirun -np 8 sander.MPI .... > > Regards > Saad Raza > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/03/26451.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/