Re: Suitibility of Aurora for one-time tasks

Kevin Sweeney Wed, 26 Feb 2014 17:18:06 -0800

For a more dynamic approach to resource utilization you can use something
like this:


# dynamic.aurora
*# Enqueue each individual work-item with aurora create -E
work_item=$work_item -E resource_profile=graph_traversals
west/service-account-name/prod/process_$work_item*
class Profile(Struct):
  queue_name = Required(String)
  resources = Required(Resources)

HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)

work_on_one_item = Process(name = 'work_on_one_item',
  cmdline = '''
    do_work "{{work_item}}"
  ''',
)

task = Task(processes = [work_on_one_item],
  resources = '{{resources[{{resource_profile}}]}}')

job = Job(
  task = task,
  cluster = 'west',
  role = 'service-account-name',
  environment = 'prod',
  name = 'process_{{work_item}}',
)

resources = {
  'graph_traversals': HIGH_MEM,
  'compilations': HIGH_CPU,
}

jobs = [job.bind(resources = resources)]



On Wed, Feb 26, 2014 at 1:08 PM, Bryan Helmkamp <br...@codeclimate.com>wrote:

> Sure. Yes, they are shell commands and yes they are provided different
> configuration on each run.
>
> In effect we have a number of different job types that are queued up,
> and we need to run as quickly as possible. Each job type has different
> resource requirements. Every time we run the job, we provide different
> arguments (the "payload"). For example:
>
> $ ./do_something.sh SOME_ID (Requires 1 CPU and 1GB RAM)
> $ ./do_something_else.sh SOME_OTHER_ID (Requires 4 CPU and 4GB RAM)
> [... there are about 12 of these ...]
>
> -Bryan
>
> On Wed, Feb 26, 2014 at 3:58 PM, Bill Farner <wfar...@apache.org> wrote:
> > Can you offer some more details on what the workload execution looks
> like?
> >  Are these shell commands?  An application that's provided different
> > configuration?
> >
> > -=Bill
> >
> >
> > On Wed, Feb 26, 2014 at 12:45 PM, Bryan Helmkamp <br...@codeclimate.com
> >wrote:
> >
> >> Thanks, Kevin. The idea of always-on workers of varying sizes is
> >> effectively what we have right now in our non-Mesos world. The problem
> >> is that sometimes we end up with not enough workers for certain
> >> classes of jobs (e.g. High Memory), while part of the cluster sits
> >> idle.
> >>
> >> Conceptually, in my mind we would define approximately a dozen Tasks,
> >> one for each type of work we need to perform (with different resource
> >> requirements), and then run Jobs, each with a Task and a unique
> >> payload, but I don't think this model works with Mesos. It seems we'd
> >> need to create a unique Task for every Job.
> >>
> >> -Bryan
> >>
> >> On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <kevi...@apache.org>
> wrote:
> >> > A job is a group of nearly-identical tasks plus some constraints like
> >> rack
> >> > diversity. The scheduler considers each task within a job equivalently
> >> > schedulable, so you can't vary things like resource footprint. It's
> >> > perfectly fine to have several jobs with just a single task, as long
> as
> >> > each has a different job key (which is (role, environment, name)).
> >> >
> >> > Another approach is to have a bunch of uniform always-on workers (in
> >> > different sizes). This can be expressed as a Service like so:
> >> >
> >> > # workers.aurora
> >> > class Profile(Struct):
> >> >   queue_name = Required(String)
> >> >   resources = Required(Resources)
> >> >   instances = Required(Integer)
> >> >
> >> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
> >> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
> >> >
> >> > work_forever = Process(name = 'work_forever',
> >> >   cmdline = '''
> >> >     # TODO: Replace this with something that isn't pseudo-bash
> >> >     while true; do
> >> >       work_item=`take_from_work_queue {{profile.queue_name}}`
> >> >       do_work "$work_item"
> >> >       tell_work_queue_finished "{{profile.queue_name}}" "$work_item"
> >> >     done
> >> >   ''')
> >> >
> >> > task = Task(processes = [work_forever],
> >> > *  resources = '{{profile.resources}}, # Note this is static per
> >> > queue-name.*
> >> > )
> >> >
> >> > service = Service(
> >> >   task = task,
> >> >   cluster = 'west',
> >> >   role = 'service-account-name',
> >> >   environment = 'prod',
> >> >   name = '{{profile.queue_name}}_processor'
> >> >   *instances = '{{profile.instances}}', # Scale here.*
> >> > )
> >> >
> >> > jobs = [
> >> >   service.bind(profile = Profile(
> >> >     resources = HIGH_MEM,
> >> >     queue_name = 'graph_traversals',
> >> >     instances = 50,
> >> >   )),
> >> >   service.bind(profile = Profile(
> >> >     resources = HIGH_CPU,
> >> >     queue_name = 'compilations',
> >> >     instances = 200,
> >> >   )),
> >> > ]
> >> >
> >> >
> >> > On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <
> br...@codeclimate.com
> >> >wrote:
> >> >
> >> >> Thanks, Bill.
> >> >>
> >> >> Am I correct in understanding that is not possible to parameterize
> >> >> individual Jobs, just Tasks? Therefore, since I don't know the job
> >> >> definitions up front, I will have parameterized Task templates, and
> >> >> generate a new Task every time I need to run a Job?
> >> >>
> >> >> Is that the recommended route?
> >> >>
> >> >> Our work is very non-uniform so I don't think work-stealing would be
> >> >> efficient for us.
> >> >>
> >> >> -Bryan
> >> >>
> >> >> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <wfar...@apache.org>
> >> wrote:
> >> >> > Thanks for checking out Aurora!
> >> >> >
> >> >> > My short answer is that Aurora should handle thousands of
> short-lived
> >> >> > tasks/jobs per day without trouble.  (If you proceed with this
> >> approach
> >> >> and
> >> >> > encounter performance issues, feel free to file tickets!)  The DSL
> >> does
> >> >> > have some mechanisms for parameterization.  In your case since you
> >> >> probably
> >> >> > don't know all the job definitions upfront, you'll probably want to
> >> >> > parameterize with environment variables.  I don't see this
> described
> >> in
> >> >> our
> >> >> > docs, but you there's a little detail at the option declaration
> [1].
> >> >> >
> >> >> > Another approach worth considering is work-stealing, using a single
> >> job
> >> >> as
> >> >> > your pool of workers.  I would find this easier to manage, but it
> >> would
> >> >> > only be suitable if your work items are sufficiently-uniform.
> >> >> >
> >> >> > Feel free to continue the discussion!  We're also pretty active in
> our
> >> >> IRC
> >> >> > channel if you'd prefer that medium.
> >> >> >
> >> >> >
> >> >> > [1]
> >> >> >
> >> >>
> >>
> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
> >> >> >
> >> >> >
> >> >> > -=Bill
> >> >> >
> >> >> >
> >> >> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <
> >> br...@codeclimate.com
> >> >> >wrote:
> >> >> >
> >> >> >> Hello,
> >> >> >>
> >> >> >> I am considering Aurora for a key component of our infrastructure.
> >> >> >> Awesome work being done here.
> >> >> >>
> >> >> >> My question is: How suitable is Aurora for running short-lived
> tasks?
> >> >> >>
> >> >> >> Background: We (Code Climate) do static analysis of tens of
> thousands
> >> >> >> of repositories every day. We run a variety of forms of analysis,
> >> with
> >> >> >> heterogeneous resource requirements, and thus our interest in
> Mesos.
> >> >> >>
> >> >> >> Looking at Aurora, a lot of the core features look very helpful to
> >> us.
> >> >> >> Where I am getting hung up is figuring out how to model
> short-lived
> >> >> >> tasks as tasks/jobs. Long-running resource allocations are not
> really
> >> >> >> an option for us due to the variation in our workloads.
> >> >> >>
> >> >> >> My first thought was to create a Task for each type of analysis we
> >> >> >> run, and then start a new Job with the appropriate Task every
> time we
> >> >> >> want to run analysis (regulated by a queue). This doesn't seem to
> >> work
> >> >> >> though. I can't `aurora create` the same `.aurora` file multiple
> >> times
> >> >> >> with different Job names (as far as I can tell). Also there is the
> >> >> >> problem of how to customize each Job slightly (e.g. a payload).
> >> >> >>
> >> >> >> An obvious alternative is to create a unique Task every time we
> want
> >> >> >> to run work. This would result in tens of thousands of tasks being
> >> >> >> created every day, and from what I can tell Aurora does not
> intend to
> >> >> >> be used like that. (Please correct me if I am wrong.)
> >> >> >>
> >> >> >> Basically, I would like to hook my job queue up to Aurora to
> perform
> >> >> >> the actual work. There are a dozen different types of jobs, each
> with
> >> >> >> different performance requirements. Every time a job runs, it has
> a
> >> >> >> unique payload containing the definition of the work it should be
> >> >> >> performed.
> >> >> >>
> >> >> >> Can Aurora be used this way? If so, what is the proper way to
> model
> >> >> >> this with respect to Jobs and Tasks?
> >> >> >>
> >> >> >> Any/all help is appreciated.
> >> >> >>
> >> >> >> Thanks!
> >> >> >>
> >> >> >> -Bryan
> >> >> >>
> >> >> >> --
> >> >> >> Bryan Helmkamp, Founder, Code Climate
> >> >> >> br...@codeclimate.com / 646-379-1810 / @brynary
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Bryan Helmkamp, Founder, Code Climate
> >> >> br...@codeclimate.com / 646-379-1810 / @brynary
> >> >>
> >>
> >>
> >>
> >> --
> >> Bryan Helmkamp, Founder, Code Climate
> >> br...@codeclimate.com / 646-379-1810 / @brynary
> >>
>
>
>
> --
> Bryan Helmkamp, Founder, Code Climate
> br...@codeclimate.com / 646-379-1810 / @brynary
>

Re: Suitibility of Aurora for one-time tasks

Reply via email to