[ https://issues.apache.org/jira/browse/HIVE-17481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438583#comment-16438583 ]
Thai Bui commented on HIVE-17481: --------------------------------- [~prasanth_j] I've been trying to use the new WM feature. I noticed that nothing has changed even though I have configured a slow pool (50% allocation) and a default pool (the other 50%). Digging into the HS2 log, I've found that the GuaranteedTaskAllocator is not working and although the queries were moved from 'default' to 'slow' pool when a certain trigger is met (bytes read), the LLAP tasks are never preempted. See the low below. `2018-04-15T04:40:59,216 ERROR [Workload management master] tez.GuaranteedTasksAllocator: No cluster information available to allocate; no guaranteed tasks will be used` Taking a look at the code, I've found that it's because `clusterState.hasClusterInfo()` is false, but why I am not sure. How to fix it? > LLAP workload management > ------------------------ > > Key: HIVE-17481 > URL: https://issues.apache.org/jira/browse/HIVE-17481 > Project: Hive > Issue Type: New Feature > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Priority: Major > Fix For: 3.0.0 > > Attachments: Workload management design doc.pdf > > > This effort is intended to improve various aspects of cluster sharing for > LLAP. Some of these are applicable to non-LLAP queries and may later be > extended to all queries. Administrators will be able to specify and apply > policies for workload management ("resource plans") that apply to the entire > cluster, with only one resource plan being active at a time. The policies > will be created and modified using new Hive DDL statements. > The policies will cover: > * Dividing the cluster into a set of (optionally, nested) query pools that > are each allocated a fraction of the cluster, a set query parallelism, > resource sharing policy between queries, and potentially others like > priority, etc. > * Mapping the incoming queries into pools based on the query user, groups, > explicit configuration, etc. > * Specifying rules that perform actions on queries based on counter values > (e.g. killing or moving queries). > One would also be able to switch policies on a live cluster without (usually) > affecting running queries, including e.g. to change policies for daytime and > nighttime usage patterns, and other similar scenarios. The switches would be > safe and atomic; versioning may eventually be supported. > Some implementation details: > * WM will only be supported in HS2 (for obvious reasons). > * All LLAP query AMs will run in "interactive" YARN queue and will be > fungible between Hive pools. > * We will use the concept of "guaranteed tasks" (also known as ducks) to > enforce cluster allocation without a central scheduler and without > compromising throughput. Guaranteed tasks preempt other (speculative) tasks > and are distributed from HS2 to AMs, and from AMs to tasks, in accordance > with percentage allocations in the policy. Each "duck" corresponds to a CPU > resource on the cluster. The implementation will be isolated so as to allow > different ones later. > * In future, we may consider improved task placement and late binding, > similar to the ones described in Sparrow paper, to work around potential > hotspots/etc. that are not avoided with the decentralized scheme. > * Only one HS2 will initially be supported to avoid split-brain workload > management. We will also implement (in a tangential set of work items) > active-passive HS2 recovery. Eventually, we intend to switch to full > active-active HS2 configuration with shared WM and Tez session pool (unlike > the current case with 2 separate session pools). -- This message was sent by Atlassian JIRA (v7.6.3#76005)