Hi Ralph,

Thank you for your reply.

On Fri, Apr 15, 2011 at 1:16 PM, Ralph Castain <r...@open-mpi.org> wrote:

> Not much we can say with that little info. :-/
>
> Are you using Open MPI? If so, what version?
>

Yes. The version is  mpirun (Open MPI) 1.2.7rc2.



> When you say the job gets restarted, do you mean that Condor restarts the
> entire MPI job?


Yes. The entire job gets restarted.


> If so, you had best talk to the Condor folks - it has nothing to do with
> Open MPI, but is due to a job control flag you are passing to Condor.
>

I have talked to them several times. But most of the cluster users are
non-mpi users and thus they don't have much knowledge about the
configuration of MPI with Condor.
If you know any person who uses Condor for running MPI jobs then please let
me know.

Cheers,

Asad


>
>
> On Apr 14, 2011, at 6:37 PM, Asad Ali wrote:
>
> > Hi all,
> >
> > I am using Condor to run my MPI jobs on a large cluster of nodes. The
> jobs run fine but after sometimes they automatically get restarted. What can
> be the reason?
> >
> > Cheers,
> >
> > Asad
> >
> > --
> >  "A Bayesian is one who, vaguely expecting a horse, and catching a
> glimpse of a donkey, strongly believes he has seen a mule."
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
 "A Bayesian is one who, vaguely expecting a horse, and catching a glimpse
of a donkey, strongly believes he has seen a mule."

Reply via email to