Re: Suitibility of Aurora for one-time tasks

Bill Farner Wed, 26 Feb 2014 13:01:20 -0800

>
> The problem
> is that sometimes we end up with not enough workers for certain
> classes of jobs (e.g. High Memory), while part of the cluster sits
> idle.



There's no prior art for this, but the Aurora API is actually designed in a
way that would make it possible to have a 'supervisor' job that tunes the
number of instances in each job by sending RPCs to the scheduler.  You'd be
trailblazing here, but it's another path to consider.


-=Bill


On Wed, Feb 26, 2014 at 12:58 PM, Bill Farner <wfar...@apache.org> wrote:

> Can you offer some more details on what the workload execution looks like?
>  Are these shell commands?  An application that's provided different
> configuration?
>
> -=Bill
>
>
> On Wed, Feb 26, 2014 at 12:45 PM, Bryan Helmkamp <br...@codeclimate.com>wrote:
>
>> Thanks, Kevin. The idea of always-on workers of varying sizes is
>> effectively what we have right now in our non-Mesos world. The problem
>> is that sometimes we end up with not enough workers for certain
>> classes of jobs (e.g. High Memory), while part of the cluster sits
>> idle.
>>
>> Conceptually, in my mind we would define approximately a dozen Tasks,
>> one for each type of work we need to perform (with different resource
>> requirements), and then run Jobs, each with a Task and a unique
>> payload, but I don't think this model works with Mesos. It seems we'd
>> need to create a unique Task for every Job.
>>
>> -Bryan
>>
>> On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <kevi...@apache.org>
>> wrote:
>> > A job is a group of nearly-identical tasks plus some constraints like
>> rack
>> > diversity. The scheduler considers each task within a job equivalently
>> > schedulable, so you can't vary things like resource footprint. It's
>> > perfectly fine to have several jobs with just a single task, as long as
>> > each has a different job key (which is (role, environment, name)).
>> >
>> > Another approach is to have a bunch of uniform always-on workers (in
>> > different sizes). This can be expressed as a Service like so:
>> >
>> > # workers.aurora
>> > class Profile(Struct):
>> >   queue_name = Required(String)
>> >   resources = Required(Resources)
>> >   instances = Required(Integer)
>> >
>> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
>> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
>> >
>> > work_forever = Process(name = 'work_forever',
>> >   cmdline = '''
>> >     # TODO: Replace this with something that isn't pseudo-bash
>> >     while true; do
>> >       work_item=`take_from_work_queue {{profile.queue_name}}`
>> >       do_work "$work_item"
>> >       tell_work_queue_finished "{{profile.queue_name}}" "$work_item"
>> >     done
>> >   ''')
>> >
>> > task = Task(processes = [work_forever],
>> > *  resources = '{{profile.resources}}, # Note this is static per
>> > queue-name.*
>> > )
>> >
>> > service = Service(
>> >   task = task,
>> >   cluster = 'west',
>> >   role = 'service-account-name',
>> >   environment = 'prod',
>> >   name = '{{profile.queue_name}}_processor'
>> >   *instances = '{{profile.instances}}', # Scale here.*
>> > )
>> >
>> > jobs = [
>> >   service.bind(profile = Profile(
>> >     resources = HIGH_MEM,
>> >     queue_name = 'graph_traversals',
>> >     instances = 50,
>> >   )),
>> >   service.bind(profile = Profile(
>> >     resources = HIGH_CPU,
>> >     queue_name = 'compilations',
>> >     instances = 200,
>> >   )),
>> > ]
>> >
>> >
>> > On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <br...@codeclimate.com
>> >wrote:
>> >
>> >> Thanks, Bill.
>> >>
>> >> Am I correct in understanding that is not possible to parameterize
>> >> individual Jobs, just Tasks? Therefore, since I don't know the job
>> >> definitions up front, I will have parameterized Task templates, and
>> >> generate a new Task every time I need to run a Job?
>> >>
>> >> Is that the recommended route?
>> >>
>> >> Our work is very non-uniform so I don't think work-stealing would be
>> >> efficient for us.
>> >>
>> >> -Bryan
>> >>
>> >> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <wfar...@apache.org>
>> wrote:
>> >> > Thanks for checking out Aurora!
>> >> >
>> >> > My short answer is that Aurora should handle thousands of short-lived
>> >> > tasks/jobs per day without trouble.  (If you proceed with this
>> approach
>> >> and
>> >> > encounter performance issues, feel free to file tickets!)  The DSL
>> does
>> >> > have some mechanisms for parameterization.  In your case since you
>> >> probably
>> >> > don't know all the job definitions upfront, you'll probably want to
>> >> > parameterize with environment variables.  I don't see this described
>> in
>> >> our
>> >> > docs, but you there's a little detail at the option declaration [1].
>> >> >
>> >> > Another approach worth considering is work-stealing, using a single
>> job
>> >> as
>> >> > your pool of workers.  I would find this easier to manage, but it
>> would
>> >> > only be suitable if your work items are sufficiently-uniform.
>> >> >
>> >> > Feel free to continue the discussion!  We're also pretty active in
>> our
>> >> IRC
>> >> > channel if you'd prefer that medium.
>> >> >
>> >> >
>> >> > [1]
>> >> >
>> >>
>> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
>> >> >
>> >> >
>> >> > -=Bill
>> >> >
>> >> >
>> >> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <
>> br...@codeclimate.com
>> >> >wrote:
>> >> >
>> >> >> Hello,
>> >> >>
>> >> >> I am considering Aurora for a key component of our infrastructure.
>> >> >> Awesome work being done here.
>> >> >>
>> >> >> My question is: How suitable is Aurora for running short-lived
>> tasks?
>> >> >>
>> >> >> Background: We (Code Climate) do static analysis of tens of
>> thousands
>> >> >> of repositories every day. We run a variety of forms of analysis,
>> with
>> >> >> heterogeneous resource requirements, and thus our interest in Mesos.
>> >> >>
>> >> >> Looking at Aurora, a lot of the core features look very helpful to
>> us.
>> >> >> Where I am getting hung up is figuring out how to model short-lived
>> >> >> tasks as tasks/jobs. Long-running resource allocations are not
>> really
>> >> >> an option for us due to the variation in our workloads.
>> >> >>
>> >> >> My first thought was to create a Task for each type of analysis we
>> >> >> run, and then start a new Job with the appropriate Task every time
>> we
>> >> >> want to run analysis (regulated by a queue). This doesn't seem to
>> work
>> >> >> though. I can't `aurora create` the same `.aurora` file multiple
>> times
>> >> >> with different Job names (as far as I can tell). Also there is the
>> >> >> problem of how to customize each Job slightly (e.g. a payload).
>> >> >>
>> >> >> An obvious alternative is to create a unique Task every time we want
>> >> >> to run work. This would result in tens of thousands of tasks being
>> >> >> created every day, and from what I can tell Aurora does not intend
>> to
>> >> >> be used like that. (Please correct me if I am wrong.)
>> >> >>
>> >> >> Basically, I would like to hook my job queue up to Aurora to perform
>> >> >> the actual work. There are a dozen different types of jobs, each
>> with
>> >> >> different performance requirements. Every time a job runs, it has a
>> >> >> unique payload containing the definition of the work it should be
>> >> >> performed.
>> >> >>
>> >> >> Can Aurora be used this way? If so, what is the proper way to model
>> >> >> this with respect to Jobs and Tasks?
>> >> >>
>> >> >> Any/all help is appreciated.
>> >> >>
>> >> >> Thanks!
>> >> >>
>> >> >> -Bryan
>> >> >>
>> >> >> --
>> >> >> Bryan Helmkamp, Founder, Code Climate
>> >> >> br...@codeclimate.com / 646-379-1810 / @brynary
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Bryan Helmkamp, Founder, Code Climate
>> >> br...@codeclimate.com / 646-379-1810 / @brynary
>> >>
>>
>>
>>
>> --
>> Bryan Helmkamp, Founder, Code Climate
>> br...@codeclimate.com / 646-379-1810 / @brynary
>>
>
>

Re: Suitibility of Aurora for one-time tasks

Reply via email to