On Apr 15, 2011, at 2:59 AM, Reuti wrote:

> Hi,
> 
> Am 15.04.2011 um 07:25 schrieb Asad Ali:
> 
>> <snip>
>> Yes. The entire job gets restarted. 
> 
> maybe this is caused by a signal sent to the job by Condor, so that it gets 
> terminated and as a result Condor restarts it. Can you trap the signals in 
> your appliaction for a test?
> 
> 
>> If so, you had best talk to the Condor folks - it has nothing to do with 
>> Open MPI, but is due to a job control flag you are passing to Condor.
>> 
>> I have talked to them several times. But most of the cluster users are 
>> non-mpi users and thus they don't have much knowledge about the 
>> configuration of MPI with Condor.
>> If you know any person who uses Condor for running MPI jobs then please let 
>> me know.
> 
> Is the use of Open MPI supported by Condor? In former times they had a 
> special universe for MPICH(1) and only for an older version to run parallel 
> jobs under Condor. Did this change?

See https://bugzilla.redhat.com/show_bug.cgi?id=537232

At one time, it appears such a script existed. You might start with the one 
offered here, and/or check on the web for updates.

I would also go to the Condor web site:

http://www.cs.wisc.edu/condor/

A search for "openmpi" revealed several presentations on how to make this work.



> 
> -- Reuti
> 
> 
>> Cheers,
>> 
>> Asad
>> 
>> 
>> 
>> On Apr 14, 2011, at 6:37 PM, Asad Ali wrote:
>> 
>>> Hi all,
>>> 
>>> I am using Condor to run my MPI jobs on a large cluster of nodes. The jobs 
>>> run fine but after sometimes they automatically get restarted. What can be 
>>> the reason?
>>> 
>>> Cheers,
>>> 
>>> Asad
>>> 
>>> --
>>> "A Bayesian is one who, vaguely expecting a horse, and catching a glimpse 
>>> of a donkey, strongly believes he has seen a mule."
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> 
>> -- 
>> "A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of 
>> a donkey, strongly believes he has seen a mule."
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to