Hi David: On Fri, Sep 03, 2010 at 10:50:02AM +1000, David Singleton wrote: > > I'm sure this has been discussed before but having watched hundreds of > thousands of cpuhrs being wasted by difficult-to-detect hung jobs, I'd > be keen to know why there isn't some sort of "spin-wait backoff" option. > For example, a way to specify spin-wait for x seconds/cycles/iterations > then backoff to lighter and lighter cpu usage. At least that way, hung > jobs would become self-evident. > > Maybe there is already some way of doing this?
For my solution to this, see http://www.open-mpi.org/community/lists/users/2010/07/13731.php HTH, Douglas. -- Douglas Guptill voice: 902-461-9749 Research Assistant, LSC 4640 email: douglas.gupt...@dal.ca Oceanography Department fax: 902-494-3877 Dalhousie University Halifax, NS, B3H 4J1, Canada