[ 
https://issues.apache.org/jira/browse/IMPALA-10204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18032507#comment-18032507
 ] 

ASF subversion and git services commented on IMPALA-10204:
----------------------------------------------------------

Commit a8618c6a65f054bd8dac578ec4a5e1eacaaab7e3 in impala's branch 
refs/heads/master from Abhishek Rawat
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a8618c6a6 ]

IMPALA-10204: Make AdmitQuery params more efficient

The admission request may contain the lineage graphs and
other stuff that the admission control service doesn't need.
For example, currently the admission controller service would
hold onto the full TQueryExecRequest object for the entire
lifetime of a query, even after the admission decision was
complete. This led to unnecessary memory consumption.

This commit introduces two optimizations for reducing the
memory footprint:
1.  A lightweight copy of TQueryExecRequest is now created
on the client side before sending to the admission control
service. Fields that are not required for admission
decisions (e.g., query_plan, lineage_graph) are cleared from
this copy.
2.  The AdmissionState now uses a unique_ptr to manage the
TQueryExecRequest. This allows the object's memory to be
explicitly released as soon as the query schedule is generated
and the request object is no longer needed.

During a customized high concurrent TPCDS run, without the
change, the peak memory usage in admissiond was around 2GB.
With this change, it required less than half that memory.

Tests:
Passed exhaustive tests.

Change-Id: I1ba5e8818336bd1fc3ad604a0acee5eb7a1116c4
Reviewed-on: http://gerrit.cloudera.org:8080/23546
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Abhishek Rawat <[email protected]>


> Evaluate AdmitQuery params for efficiency
> -----------------------------------------
>
>                 Key: IMPALA-10204
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10204
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>            Reporter: Thomas Tauber-Marshall
>            Assignee: Yida Wu
>            Priority: Critical
>         Attachments: query_tpcds.sql
>
>
> In the first version of the AdmissionControlService, we're sending the entire 
> TQueryExecRequest/TQueryOptions as a sidecar to the admission controller. 
> There are various things contained in the TQueryExecRequest/TQueryOptions 
> that are not actually needed by the admission controller, and sending them 
> increases network load and query running time unnecessarily.
> We should evaluate how much of a perf impact there is due to this and how 
> much could actually be removed.
> Some small things may be non-trivial to remove and ultimately not worth it, 
> for example the tree of TPlanNodes contains some info needed by the admission 
> controller (eg. memory estimates) and some things that are not (eg. runtime 
> filter descriptors). Making two parallel trees, one with only 
> admission-required data (which would require extensive refactoring in the 
> planner or wasted work in the coordinator copying out the required parts from 
> what the planner returns) may be too complicated/introduce too much other 
> overhead to be worth it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to