[ 
https://issues.apache.org/jira/browse/HIVE-23095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071690#comment-17071690
 ] 

Zoltan Haindrich commented on HIVE-23095:
-----------------------------------------

the 
[getSize()|https://github.com/apache/hive/blob/d2ad5b061706a1d3cd55e59c769ed4f2af01cdbe/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java#L152]
 method was adjusted with the tempList size in HIVE-19578; which causes the 
{{getSize}} method to be an overestimation of the actual size - because there 
is limit value at which the SPARSE/DENSE switch happens ; that code could be 
triggered for much less values triggered  [in 
HyperLogLog.add|https://github.com/apache/hive/blob/d2ad5b061706a1d3cd55e59c769ed4f2af01cdbe/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L261]


> NDV might be overestimated for a table with ~70 value
> -----------------------------------------------------
>
>                 Key: HIVE-23095
>                 URL: https://issues.apache.org/jira/browse/HIVE-23095
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Major
>
> uncovered during looking into HIVE-23082
> https://issues.apache.org/jira/browse/HIVE-23082?focusedCommentId=17067773&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17067773



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to