Hi,

Am 15.04.2011 um 07:25 schrieb Asad Ali:

> <snip>
> Yes. The entire job gets restarted. 

maybe this is caused by a signal sent to the job by Condor, so that it gets 
terminated and as a result Condor restarts it. Can you trap the signals in your 
appliaction for a test?


> If so, you had best talk to the Condor folks - it has nothing to do with Open 
> MPI, but is due to a job control flag you are passing to Condor.
> 
> I have talked to them several times. But most of the cluster users are 
> non-mpi users and thus they don't have much knowledge about the configuration 
> of MPI with Condor.
> If you know any person who uses Condor for running MPI jobs then please let 
> me know.

Is the use of Open MPI supported by Condor? In former times they had a special 
universe for MPICH(1) and only for an older version to run parallel jobs under 
Condor. Did this change?

-- Reuti


> Cheers,
> 
> Asad
>  
> 
> 
> On Apr 14, 2011, at 6:37 PM, Asad Ali wrote:
> 
> > Hi all,
> >
> > I am using Condor to run my MPI jobs on a large cluster of nodes. The jobs 
> > run fine but after sometimes they automatically get restarted. What can be 
> > the reason?
> >
> > Cheers,
> >
> > Asad
> >
> > --
> >  "A Bayesian is one who, vaguely expecting a horse, and catching a glimpse 
> > of a donkey, strongly believes he has seen a mule."
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
>  "A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of 
> a donkey, strongly believes he has seen a mule."
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to