[ https://issues.apache.org/jira/browse/FLINK-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-14029: ----------------------------------- Labels: pull-request-available (was: ) > Update Flink's Mesos scheduling behavior to reject all expired offers > --------------------------------------------------------------------- > > Key: FLINK-14029 > URL: https://issues.apache.org/jira/browse/FLINK-14029 > Project: Flink > Issue Type: Bug > Reporter: Piyush Narang > Priority: Minor > Labels: pull-request-available > > While digging into why our Flink jobs weren't being scheduled on our internal > Mesos setup we noticed that we were hitting Mesos quota limits tied to the > way we've set up the Fenzo (https://github.com/Netflix/Fenzo/) library > defaults in the Flink project. > Behavior we noticed was that we got a bunch of offers from our Mesos master > (50+) out of which only 1 or 2 of them were super skewed and took up a huge > chunk of our disk resource quota. Thanks to this we were not sent any new / > different offers (as our usage at the time + resource offers reached our > Mesos disk quota). As the Flink / Fenzo Mesos scheduling code was not using > the 1-2 skewed disk offers they end up expiring. The way we've set up the > Fenzo scheduler is to use the default values on when to expire unused offers > (120s) and maximum number of unused offer leases at a time (4). Unfortunately > as we have a considerable number of outstanding expired offers (50+) we end > up in a situation where we reject only 4 or so every 2 mins and we never get > around to rejecting the super skewed disk ones which are stopping us from > scheduling our Flink job. Thanks to this we end up in a situation where our > job is waiting to be scheduled for more than an hour. > An option to work around this is to reject all expired offers at 2 minute > expiry time rather than hold on to them. This will allow Mesos to send > alternate offers that might be scheduled by Fenzo. -- This message was sent by Atlassian Jira (v8.3.2#803003)