Noémi Pap-Takács created IMPALA-14117:
-----------------------------------------

             Summary: Select files only once for COMPUTE STATS with TABLESAMPLE 
clause
                 Key: IMPALA-14117
                 URL: https://issues.apache.org/jira/browse/IMPALA-14117
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: Noémi Pap-Takács


In case of COMPUTE STATS, selecting the files to sample is done twice: first 
during analysis, then in the SCAN plan nodes. The first is necessary, because 
COMPUTE STATS needs to calculate the sampling to set the value of 
'effectiveSamplePerc_' which is used in stats extrapolation to get more precise 
stats.

We could probably improve this, to always calculate the file samples once, 
during analysis, and then reuse the result in the SCAN plan node. 

 

In other cases, e.g. SELECT * FROM t TABLESAMPLE SYSTEM(10); we only calculate 
the sampling in the SCAN plan nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to