[ https://issues.apache.org/jira/browse/HIVE-22979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053629#comment-17053629 ]
Jesus Camacho Rodriguez commented on HIVE-22979: ------------------------------------------------ Left a couple of minor comments in the PR. Just another idea. I understand we may not want to expose this number in explain as it would change all plans, but maybe we want to do it in extended explain as it would help debugging any issues? There are already two different methods for user vs default/extended explain. {code} ... @Override @Explain(displayName = "Statistics") public String toString() { ... @Explain(displayName = "Statistics", explainLevels = { Level.USER }) public String toUserLevelExplainString() { ... {code} We could create a third one, specific for extended explain that includes this number. We can tackle that in follow-up though. Other than that, LGTM, +1 > Support total file size in statistics annotation > ------------------------------------------------ > > Key: HIVE-22979 > URL: https://issues.apache.org/jira/browse/HIVE-22979 > Project: Hive > Issue Type: Improvement > Affects Versions: 4.0.0 > Reporter: Prasanth Jayachandran > Assignee: Prasanth Jayachandran > Priority: Minor > Labels: pull-request-available > Attachments: HIVE-22979.1.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Hive statistics annotation provide estimated Statistics for each operator. > The data size provided in TableScanOperator is raw data size (after > decompression and decoding), but there are some optimizations that can be > performed based on total file size on disk (scan cost estimation). -- This message was sent by Atlassian Jira (v8.3.4#803005)