Pagination would be a no-op to the client because it would be opt-in only, so it would continue to fetch all the tasks in one request.
But you raise a good point in that presumably the client is also going to be blocked for several seconds while executing getTasksStatus for large jobs. Making the response more lightweight could be a big win there, but I would need a better understanding of how the client is using those responses first. On Tue, May 27, 2014 at 3:34 PM, Mark Chu-Carroll <mchucarr...@apache.org>wrote: > Interestingly, when we first expanded getTasksStatus, I didn't like the > idea, because I thought it would have exactly this problem! It's a *lot* of > information to get in a single burst. > > Have you checked what effect it'll have on the command-line client? In > general, the command-line has the context do a single API call, gathers the > results, and returns to a command implementation. It'll definitely > complicate things to add pagination. How much of an effect will it be? > > -Mark > > > > On Tue, May 27, 2014 at 5:32 PM, David McLaughlin <da...@dmclaughlin.com > >wrote: > > > As outlined in AURORA-458, using the new jobs page with a large (but > > reasonable) number of active and complete tasks can take a long time[1] > to > > render. Performance profiling done as part of AURORA-471 shows that the > > main factor in response time is rendering and returning the size of the > > uncompressed payload to the client. > > > > To that end, I think we have two approaches: > > > > 1) Add pagination to the getTasksStatus call. > > 2) Make the getTasksStatus response more lightweight. > > > > > > Pagination > > --------------- > > > > Pagination would be the simplest approach, and would scale to arbitrarily > > large numbers of tasks moving forward. The main issue with this is that > we > > need all active tasks to build the configuration summary at the top of > the > > job page. > > > > As a workaround we could add a new API call - getTaskConfigSummary - > which > > returns something like: > > > > > > struct ConfigGroup { > > 1: TaskConfig config > > 2: set<i32> instanceIds > > } > > > > struct ConfigSummary { > > 1: JobKey jobKey > > 2: set<ConfigGroup> groups > > } > > > > > > To support pagination without breaking the existing API, we could add > > offset and limit fields to the TaskQuery struct. > > > > > > Make getTasksStatus more lightweight > > ------------------------------------ > > > > getTasksStatus currently returns a list of ScheduledTask instances. The > > biggest (in terms of payload size) child object of a ScheduledTask is the > > TaskConfig struct, which itself contains an ExecutorConfig. > > > > I took a sample response from one of our internal production instances > and > > it turns out that around 65% of the total response size was for > > ExecutorConfig objects, and specifically the "cmdline" property of these. > > We currently do not use this information anywhere in the UI nor do we > > inspect it when grouping taskConfigs, and it would be a relatively easy > win > > to just drop these from the response. > > > > We'd still need this information for the config grouping, so we could add > > the response suggested for getTaskConfigSummary as another property and > > allow the client to reconcile these objects if it needs to: > > > > > > struct TaskStatusResponse { > > 1: list<LightweightTask> tasks > > 2: set<ConfigGroup> configSummary > > } > > > > > > This would significantly reduce the uncompressed payload size while still > > containing the same data. > > > > However, there is still a potentially significant part of a payload size > > remaining: task events (and these *are* currently used in the UI). We > could > > solve this by dropping task events from the LightweightTask struct too, > and > > fetching them lazily in batches. > > > > i.e. an API call like: > > > > > > getTaskEvents(1: JobKey key, 2: set<i32> instanceIds) > > > > > > Could return: > > > > > > struct TaskEventResult { > > 1: i32 instanceId > > 2: list<TaskEvent> taskEvents > > } > > > > struct TaskEventResponse { > > 1: JobKey key > > 2: list<TaskEventResult> results > > } > > > > > > Events could then only be fetched and rendered as the user clicks through > > the pages of tasks. > > > > > > Proposal > > ------------- > > > > I think pagination makes more sense here. It adds moderate overhead to > the > > complexity of the UI (this is purely due to our use of smart-table which > is > > not so server-side pagination friendly) but the client logic would > actually > > be simpler with the new getTaskConfigSummary api call. > > > > I do think there is value in considering whether the ScheduledTask struct > > needs to contain all of the information it does - but this could be done > as > > part of a separate or complimentary performance improvement ticket. > > > > > > > > > > [1] - At Twitter we observed 2000 active + 100 finished tasks having a > > payload size of 10MB which took 8~10 seconds to complete. > > >