On Sep 3, 2010, at 5:10 PM, David Singleton wrote:
> On 09/03/2010 10:05 PM, Jeff Squyres wrote:
>> On Sep 3, 2010, at 12:16 AM, Ralph Castain wrote:
>>
>>> Backing off the polling rate requires more application-specific logic like
>>> that offered below, so it is a little difficult for us to i
On 09/03/2010 10:05 PM, Jeff Squyres wrote:
On Sep 3, 2010, at 12:16 AM, Ralph Castain wrote:
Backing off the polling rate requires more application-specific logic like that
offered below, so it is a little difficult for us to implement at the MPI
library level. Not saying we eventually won't
On Sep 3, 2010, at 12:16 AM, Ralph Castain wrote:
> Backing off the polling rate requires more application-specific logic like
> that offered below, so it is a little difficult for us to implement at the
> MPI library level. Not saying we eventually won't - just not sure anyone
> quite knows ho
In the upcoming 1.5 series, we will introduce a new "sensor" framework to help
resolve such issues. Among other things, it will automatically track (if
requested) the size of a sentinel file, cpu usage, and memory footprint and
will terminate the job if any exceed user-specified limits (e.g., fi
Hi David:
On Fri, Sep 03, 2010 at 10:50:02AM +1000, David Singleton wrote:
>
> I'm sure this has been discussed before but having watched hundreds of
> thousands of cpuhrs being wasted by difficult-to-detect hung jobs, I'd
> be keen to know why there isn't some sort of "spin-wait backoff" option.
I'm sure this has been discussed before but having watched hundreds of
thousands of cpuhrs being wasted by difficult-to-detect hung jobs, I'd
be keen to know why there isn't some sort of "spin-wait backoff" option.
For example, a way to specify spin-wait for x seconds/cycles/iterations
then backo