Noémi Pap-Takács created IMPALA-14117:
-----------------------------------------
Summary: Select files only once for COMPUTE STATS with TABLESAMPLE
clause
Key: IMPALA-14117
URL: https://issues.apache.org/jira/browse/IMPALA-14117
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Reporter: Noémi Pap-Takács
In case of COMPUTE STATS, selecting the files to sample is done twice: first
during analysis, then in the SCAN plan nodes. The first is necessary, because
COMPUTE STATS needs to calculate the sampling to set the value of
'effectiveSamplePerc_' which is used in stats extrapolation to get more precise
stats.
We could probably improve this, to always calculate the file samples once,
during analysis, and then reuse the result in the SCAN plan node.
In other cases, e.g. SELECT * FROM t TABLESAMPLE SYSTEM(10); we only calculate
the sampling in the SCAN plan nodes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]