So, to summarize from large cluster perspective the resources are held to 
complete some expensive operations and avoid doing it frequently that also 
blocks the callback thread.
That brings an interesting point. Is there any out of box instrumentation on 
the scheduler performance that one can simulate the workload and resource 
filtering, priority/preemption policies to come up with some kind of tuning for 
these parameters?
If not what would you suggest are the critical metrics to track? Unscheduled 
task queue size, detect degraded times in scheduling them etc?

Thx
On Wednesday, April 30, 2014 9:38 PM, Bill Farner <wfar...@apache.org> wrote:
 
Aurora holds offers for a few reasons:
- To avoid blocking the mesos driver callback thread while matching offers
to pending tasks
- To enable preemption (determine whether cluster resources are exhausted,
and a low priority task should be evicted for a high priority one)
- To perform optimize scheduling decisions (choosing the best offer based
on things like failure domains)

As clusters grow very large in terms of slaves and tasks, these features
become necessary for the scheduler to remain responsive and predictable.

As you experienced, this has the effect of starving cohort frameworks.  The
mesos team plans to implement offer revocation [1] to mitigate this.  In
the meantime, you can tune the amount of time aurora holds offers as a
workaround with the min_offer_hold_time [2] command line argument, e.g.:

-min_offer_hold_time=1secs


Unfortunately, this value is used in conjunction with a hard-coded jitter
[3], so you still have an upper bound of one minute hold time.  If this
presents an issue, we'd happily accept a patch to make the jitter window
tunable as well!


-=Bill

[1] https://issues.apache.org/jira/browse/MESOS-354
[2]
https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L95-98
[3]
https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L323

On Wed, Apr 30, 2014 at 3:19 PM, mohit soni <mohitsoni1...@gmail.com> wrote:

> We observed that Aurora's CPU share in Mesos Master dashboard spikes upto
> 100%, even when Aurora is not running any Job. Looking at the code, I
> figured that scheduler holds on to the resourceOffers, even if there are no
> tasks to be scheduled and doesn't decline the offer immediately.
>
> It looks like an optimization, where the TaskLaunchers are kept primed with
> resourceOffers, so that the Job can be run as soon as it's scheduled (if
> task requirements are satisfied).
>
> But, this leads to an offer starvation problem for other peer frameworks
> who tend to decline offers if they don't have tasks to be scheduled (for
> the timeout period).
>
> How can we handle this in scenarios where Aurora is running with other peer
> frameworks ?
>
> Thanks
> Mohit
>

Reply via email to