On Wed, Feb 26, 2014 at 7:45 PM, Bryan Helmkamp <br...@codeclimate.com>wrote:
> Got it. Thanks. Do finished Jobs and Tasks get garbage collected > automatically at some point? > Otherwise it seems like they will stack up pretty fast. (We might run > hundreds of thousands of jobs in a day.) > Jobs are garbage-collected after a configurable period of inactivity. This is tuned on the scheduler with the command line arg history_prune_threshold, default is currently 2 days. > > BTW, Aurora does not seem to like the resources = > '{{resources[{{resource_profile}}]}}' part. I tried to fix it, but > keep getting: > > InvalidConfigError: Expected dictionary argument, got > '{{resources[{{resource_profile}}]}}' > Kevin -- does the DSL support nested interpolation? Either way, maybe you meant this: task = Task(processes = [work_on_one_item], resources = '{{resources[{{work_item}}]}}') > > (For now I'm using a different .aurora file for each resource > configuration.) > > Best, > > -Bryan > > On Wed, Feb 26, 2014 at 9:04 PM, Kevin Sweeney <kevi...@apache.org> wrote: > > And after a bit of code spelunking the semantics you want already exist > > (just undocumented). Updated the ticket to update the documentation. > > > > > > On Wed, Feb 26, 2014 at 6:00 PM, Kevin Sweeney <kevi...@apache.org> > wrote: > > > >> The example I gave is somewhat syntactically invalid due to coding via > >> email, but that's more or less what the interface will look like. I also > >> filed https://issues.apache.org/jira/browse/AURORA-236 for more > >> first-class support of the semantics I think you want (though currently > you > >> can fake it by setting max_failures to a very high number). > >> > >> > >> On Wed, Feb 26, 2014 at 5:33 PM, Bryan Helmkamp <br...@codeclimate.com > >wrote: > >> > >>> Thanks, Kevin. That pretty much looks like exactly what I need. > >>> > >>> -Bryan > >>> > >>> On Wed, Feb 26, 2014 at 8:16 PM, Kevin Sweeney <kevi...@apache.org> > >>> wrote: > >>> > For a more dynamic approach to resource utilization you can use > >>> something > >>> > like this: > >>> > > >>> > # dynamic.aurora > >>> > *# Enqueue each individual work-item with aurora create -E > >>> > work_item=$work_item -E resource_profile=graph_traversals > >>> > west/service-account-name/prod/process_$work_item* > >>> > class Profile(Struct): > >>> > queue_name = Required(String) > >>> > resources = Required(Resources) > >>> > > >>> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB) > >>> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB) > >>> > > >>> > work_on_one_item = Process(name = 'work_on_one_item', > >>> > cmdline = ''' > >>> > do_work "{{work_item}}" > >>> > ''', > >>> > ) > >>> > > >>> > task = Task(processes = [work_on_one_item], > >>> > resources = '{{resources[{{resource_profile}}]}}') > >>> > > >>> > job = Job( > >>> > task = task, > >>> > cluster = 'west', > >>> > role = 'service-account-name', > >>> > environment = 'prod', > >>> > name = 'process_{{work_item}}', > >>> > ) > >>> > > >>> > resources = { > >>> > 'graph_traversals': HIGH_MEM, > >>> > 'compilations': HIGH_CPU, > >>> > } > >>> > > >>> > jobs = [job.bind(resources = resources)] > >>> > > >>> > > >>> > > >>> > On Wed, Feb 26, 2014 at 1:08 PM, Bryan Helmkamp < > br...@codeclimate.com > >>> >wrote: > >>> > > >>> >> Sure. Yes, they are shell commands and yes they are provided > different > >>> >> configuration on each run. > >>> >> > >>> >> In effect we have a number of different job types that are queued > up, > >>> >> and we need to run as quickly as possible. Each job type has > different > >>> >> resource requirements. Every time we run the job, we provide > different > >>> >> arguments (the "payload"). For example: > >>> >> > >>> >> $ ./do_something.sh SOME_ID (Requires 1 CPU and 1GB RAM) > >>> >> $ ./do_something_else.sh SOME_OTHER_ID (Requires 4 CPU and 4GB RAM) > >>> >> [... there are about 12 of these ...] > >>> >> > >>> >> -Bryan > >>> >> > >>> >> On Wed, Feb 26, 2014 at 3:58 PM, Bill Farner <wfar...@apache.org> > >>> wrote: > >>> >> > Can you offer some more details on what the workload execution > looks > >>> >> like? > >>> >> > Are these shell commands? An application that's provided > different > >>> >> > configuration? > >>> >> > > >>> >> > -=Bill > >>> >> > > >>> >> > > >>> >> > On Wed, Feb 26, 2014 at 12:45 PM, Bryan Helmkamp < > >>> br...@codeclimate.com > >>> >> >wrote: > >>> >> > > >>> >> >> Thanks, Kevin. The idea of always-on workers of varying sizes is > >>> >> >> effectively what we have right now in our non-Mesos world. The > >>> problem > >>> >> >> is that sometimes we end up with not enough workers for certain > >>> >> >> classes of jobs (e.g. High Memory), while part of the cluster > sits > >>> >> >> idle. > >>> >> >> > >>> >> >> Conceptually, in my mind we would define approximately a dozen > >>> Tasks, > >>> >> >> one for each type of work we need to perform (with different > >>> resource > >>> >> >> requirements), and then run Jobs, each with a Task and a unique > >>> >> >> payload, but I don't think this model works with Mesos. It seems > >>> we'd > >>> >> >> need to create a unique Task for every Job. > >>> >> >> > >>> >> >> -Bryan > >>> >> >> > >>> >> >> On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney < > kevi...@apache.org> > >>> >> wrote: > >>> >> >> > A job is a group of nearly-identical tasks plus some > constraints > >>> like > >>> >> >> rack > >>> >> >> > diversity. The scheduler considers each task within a job > >>> equivalently > >>> >> >> > schedulable, so you can't vary things like resource footprint. > >>> It's > >>> >> >> > perfectly fine to have several jobs with just a single task, as > >>> long > >>> >> as > >>> >> >> > each has a different job key (which is (role, environment, > name)). > >>> >> >> > > >>> >> >> > Another approach is to have a bunch of uniform always-on > workers > >>> (in > >>> >> >> > different sizes). This can be expressed as a Service like so: > >>> >> >> > > >>> >> >> > # workers.aurora > >>> >> >> > class Profile(Struct): > >>> >> >> > queue_name = Required(String) > >>> >> >> > resources = Required(Resources) > >>> >> >> > instances = Required(Integer) > >>> >> >> > > >>> >> >> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB) > >>> >> >> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB) > >>> >> >> > > >>> >> >> > work_forever = Process(name = 'work_forever', > >>> >> >> > cmdline = ''' > >>> >> >> > # TODO: Replace this with something that isn't pseudo-bash > >>> >> >> > while true; do > >>> >> >> > work_item=`take_from_work_queue {{profile.queue_name}}` > >>> >> >> > do_work "$work_item" > >>> >> >> > tell_work_queue_finished "{{profile.queue_name}}" > >>> "$work_item" > >>> >> >> > done > >>> >> >> > ''') > >>> >> >> > > >>> >> >> > task = Task(processes = [work_forever], > >>> >> >> > * resources = '{{profile.resources}}, # Note this is static > per > >>> >> >> > queue-name.* > >>> >> >> > ) > >>> >> >> > > >>> >> >> > service = Service( > >>> >> >> > task = task, > >>> >> >> > cluster = 'west', > >>> >> >> > role = 'service-account-name', > >>> >> >> > environment = 'prod', > >>> >> >> > name = '{{profile.queue_name}}_processor' > >>> >> >> > *instances = '{{profile.instances}}', # Scale here.* > >>> >> >> > ) > >>> >> >> > > >>> >> >> > jobs = [ > >>> >> >> > service.bind(profile = Profile( > >>> >> >> > resources = HIGH_MEM, > >>> >> >> > queue_name = 'graph_traversals', > >>> >> >> > instances = 50, > >>> >> >> > )), > >>> >> >> > service.bind(profile = Profile( > >>> >> >> > resources = HIGH_CPU, > >>> >> >> > queue_name = 'compilations', > >>> >> >> > instances = 200, > >>> >> >> > )), > >>> >> >> > ] > >>> >> >> > > >>> >> >> > > >>> >> >> > On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp < > >>> >> br...@codeclimate.com > >>> >> >> >wrote: > >>> >> >> > > >>> >> >> >> Thanks, Bill. > >>> >> >> >> > >>> >> >> >> Am I correct in understanding that is not possible to > >>> parameterize > >>> >> >> >> individual Jobs, just Tasks? Therefore, since I don't know the > >>> job > >>> >> >> >> definitions up front, I will have parameterized Task > templates, > >>> and > >>> >> >> >> generate a new Task every time I need to run a Job? > >>> >> >> >> > >>> >> >> >> Is that the recommended route? > >>> >> >> >> > >>> >> >> >> Our work is very non-uniform so I don't think work-stealing > >>> would be > >>> >> >> >> efficient for us. > >>> >> >> >> > >>> >> >> >> -Bryan > >>> >> >> >> > >>> >> >> >> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner < > >>> wfar...@apache.org> > >>> >> >> wrote: > >>> >> >> >> > Thanks for checking out Aurora! > >>> >> >> >> > > >>> >> >> >> > My short answer is that Aurora should handle thousands of > >>> >> short-lived > >>> >> >> >> > tasks/jobs per day without trouble. (If you proceed with > this > >>> >> >> approach > >>> >> >> >> and > >>> >> >> >> > encounter performance issues, feel free to file tickets!) > The > >>> DSL > >>> >> >> does > >>> >> >> >> > have some mechanisms for parameterization. In your case > since > >>> you > >>> >> >> >> probably > >>> >> >> >> > don't know all the job definitions upfront, you'll probably > >>> want to > >>> >> >> >> > parameterize with environment variables. I don't see this > >>> >> described > >>> >> >> in > >>> >> >> >> our > >>> >> >> >> > docs, but you there's a little detail at the option > declaration > >>> >> [1]. > >>> >> >> >> > > >>> >> >> >> > Another approach worth considering is work-stealing, using a > >>> single > >>> >> >> job > >>> >> >> >> as > >>> >> >> >> > your pool of workers. I would find this easier to manage, > but > >>> it > >>> >> >> would > >>> >> >> >> > only be suitable if your work items are > sufficiently-uniform. > >>> >> >> >> > > >>> >> >> >> > Feel free to continue the discussion! We're also pretty > >>> active in > >>> >> our > >>> >> >> >> IRC > >>> >> >> >> > channel if you'd prefer that medium. > >>> >> >> >> > > >>> >> >> >> > > >>> >> >> >> > [1] > >>> >> >> >> > > >>> >> >> >> > >>> >> >> > >>> >> > >>> > https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183 > >>> >> >> >> > > >>> >> >> >> > > >>> >> >> >> > -=Bill > >>> >> >> >> > > >>> >> >> >> > > >>> >> >> >> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp < > >>> >> >> br...@codeclimate.com > >>> >> >> >> >wrote: > >>> >> >> >> > > >>> >> >> >> >> Hello, > >>> >> >> >> >> > >>> >> >> >> >> I am considering Aurora for a key component of our > >>> infrastructure. > >>> >> >> >> >> Awesome work being done here. > >>> >> >> >> >> > >>> >> >> >> >> My question is: How suitable is Aurora for running > short-lived > >>> >> tasks? > >>> >> >> >> >> > >>> >> >> >> >> Background: We (Code Climate) do static analysis of tens of > >>> >> thousands > >>> >> >> >> >> of repositories every day. We run a variety of forms of > >>> analysis, > >>> >> >> with > >>> >> >> >> >> heterogeneous resource requirements, and thus our interest > in > >>> >> Mesos. > >>> >> >> >> >> > >>> >> >> >> >> Looking at Aurora, a lot of the core features look very > >>> helpful to > >>> >> >> us. > >>> >> >> >> >> Where I am getting hung up is figuring out how to model > >>> >> short-lived > >>> >> >> >> >> tasks as tasks/jobs. Long-running resource allocations are > not > >>> >> really > >>> >> >> >> >> an option for us due to the variation in our workloads. > >>> >> >> >> >> > >>> >> >> >> >> My first thought was to create a Task for each type of > >>> analysis we > >>> >> >> >> >> run, and then start a new Job with the appropriate Task > every > >>> >> time we > >>> >> >> >> >> want to run analysis (regulated by a queue). This doesn't > >>> seem to > >>> >> >> work > >>> >> >> >> >> though. I can't `aurora create` the same `.aurora` file > >>> multiple > >>> >> >> times > >>> >> >> >> >> with different Job names (as far as I can tell). Also there > >>> is the > >>> >> >> >> >> problem of how to customize each Job slightly (e.g. a > >>> payload). > >>> >> >> >> >> > >>> >> >> >> >> An obvious alternative is to create a unique Task every > time > >>> we > >>> >> want > >>> >> >> >> >> to run work. This would result in tens of thousands of > tasks > >>> being > >>> >> >> >> >> created every day, and from what I can tell Aurora does not > >>> >> intend to > >>> >> >> >> >> be used like that. (Please correct me if I am wrong.) > >>> >> >> >> >> > >>> >> >> >> >> Basically, I would like to hook my job queue up to Aurora > to > >>> >> perform > >>> >> >> >> >> the actual work. There are a dozen different types of jobs, > >>> each > >>> >> with > >>> >> >> >> >> different performance requirements. Every time a job runs, > it > >>> has > >>> >> a > >>> >> >> >> >> unique payload containing the definition of the work it > >>> should be > >>> >> >> >> >> performed. > >>> >> >> >> >> > >>> >> >> >> >> Can Aurora be used this way? If so, what is the proper way > to > >>> >> model > >>> >> >> >> >> this with respect to Jobs and Tasks? > >>> >> >> >> >> > >>> >> >> >> >> Any/all help is appreciated. > >>> >> >> >> >> > >>> >> >> >> >> Thanks! > >>> >> >> >> >> > >>> >> >> >> >> -Bryan > >>> >> >> >> >> > >>> >> >> >> >> -- > >>> >> >> >> >> Bryan Helmkamp, Founder, Code Climate > >>> >> >> >> >> br...@codeclimate.com / 646-379-1810 / @brynary > >>> >> >> >> >> > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> -- > >>> >> >> >> Bryan Helmkamp, Founder, Code Climate > >>> >> >> >> br...@codeclimate.com / 646-379-1810 / @brynary > >>> >> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> -- > >>> >> >> Bryan Helmkamp, Founder, Code Climate > >>> >> >> br...@codeclimate.com / 646-379-1810 / @brynary > >>> >> >> > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> Bryan Helmkamp, Founder, Code Climate > >>> >> br...@codeclimate.com / 646-379-1810 / @brynary > >>> >> > >>> > >>> > >>> > >>> -- > >>> Bryan Helmkamp, Founder, Code Climate > >>> br...@codeclimate.com / 646-379-1810 / @brynary > >>> > >> > >> > > > > -- > Bryan Helmkamp, Founder, Code Climate > br...@codeclimate.com / 646-379-1810 / @brynary >