[ 
https://issues.apache.org/jira/browse/FLINK-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-14029:
-----------------------------------
    Labels: pull-request-available  (was: )

> Update Flink's Mesos scheduling behavior to reject all expired offers
> ---------------------------------------------------------------------
>
>                 Key: FLINK-14029
>                 URL: https://issues.apache.org/jira/browse/FLINK-14029
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Piyush Narang
>            Priority: Minor
>              Labels: pull-request-available
>
> While digging into why our Flink jobs weren't being scheduled on our internal 
> Mesos setup we noticed that we were hitting Mesos quota limits tied to the 
> way we've set up the Fenzo (https://github.com/Netflix/Fenzo/) library 
> defaults in the Flink project. 
> Behavior we noticed was that we got a bunch of offers from our Mesos master 
> (50+) out of which only 1 or 2 of them were super skewed and took up a huge 
> chunk of our disk resource quota. Thanks to this we were not sent any new / 
> different offers (as our usage at the time + resource offers reached our 
> Mesos disk quota). As the Flink / Fenzo Mesos scheduling code was not using 
> the 1-2 skewed disk offers they end up expiring. The way we've set up the 
> Fenzo scheduler is to use the default values on when to expire unused offers 
> (120s) and maximum number of unused offer leases at a time (4). Unfortunately 
> as we have a considerable number of outstanding expired offers (50+) we end 
> up in a situation where we reject only 4 or so every 2 mins and we never get 
> around to rejecting the super skewed disk ones which are stopping us from 
> scheduling our Flink job. Thanks to this we end up in a situation where our 
> job is waiting to be scheduled for more than an hour. 
> An option to work around this is to reject all expired offers at 2 minute 
> expiry time rather than hold on to them. This will allow Mesos to send 
> alternate offers that might be scheduled by Fenzo. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to