Hi, Am 15.04.2011 um 07:25 schrieb Asad Ali:
> <snip> > Yes. The entire job gets restarted. maybe this is caused by a signal sent to the job by Condor, so that it gets terminated and as a result Condor restarts it. Can you trap the signals in your appliaction for a test? > If so, you had best talk to the Condor folks - it has nothing to do with Open > MPI, but is due to a job control flag you are passing to Condor. > > I have talked to them several times. But most of the cluster users are > non-mpi users and thus they don't have much knowledge about the configuration > of MPI with Condor. > If you know any person who uses Condor for running MPI jobs then please let > me know. Is the use of Open MPI supported by Condor? In former times they had a special universe for MPICH(1) and only for an older version to run parallel jobs under Condor. Did this change? -- Reuti > Cheers, > > Asad > > > > On Apr 14, 2011, at 6:37 PM, Asad Ali wrote: > > > Hi all, > > > > I am using Condor to run my MPI jobs on a large cluster of nodes. The jobs > > run fine but after sometimes they automatically get restarted. What can be > > the reason? > > > > Cheers, > > > > Asad > > > > -- > > "A Bayesian is one who, vaguely expecting a horse, and catching a glimpse > > of a donkey, strongly believes he has seen a mule." > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > "A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of > a donkey, strongly believes he has seen a mule." > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users