+aurora dev list There are many mechanisms built into Aurora to prevent these scenarios, you may want to reach out to them for insight.
We have discussed some API rate limiting within Mesos, but this is near the limit of policy that Mesos could enforce, as we don't understand the semantics of the tasks being launched. Rate limiting within Mesos also doesn't solve the problem of a flapping task within Marathon. ---------- Forwarded message ---------- From: Dick Davies <d...@hellooperator.net> Date: Wed, Apr 30, 2014 at 11:30 AM Subject: protecting mesos from fat fingers To: u...@mesos.apache.org Managed to take out a mesos slave today with a typo while launching a marathon app, and wondered if there are throttles/limits that can be applied to repeated launches to limit the risk of such mistakes in the future. I started a thread on the marathon list ( https://groups.google.com/forum/?hl=en#!topic/marathon-framework/4iWLqTYTvgM ) [ TL:DR: marathon throws an app that will never deploy correctly at slaves until the disk fills with debris and the slave dies ] but I suppose this could be something available in mesos itself. I can't find a lot of advice about operational aspects of Mesos admin; could others here provide some good advice about their experience in preventing failed task deploys from causing trouble on their clusters? Thanks!