[ 
https://issues.apache.org/jira/browse/HIVE-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116169#comment-15116169
 ] 

Alan Gates commented on HIVE-12763:
-----------------------------------

bq. Alan>> In hbase_metastore_proto.proto, I'm surprised to see that you are 
storing the bit vectors as strings. Why not as bytes?
bq. Pengcheng>> I store bit vector as strings because the default serialization 
and de-serialization is Text (or String) in Hive
Ok, I'm wondering if your de/serialization could be more efficient if you 
stored it as a binary value rather than text.  But maybe it's not a big enough 
deal to worry about.

On the NOTICE file, I'm wrong.  You're just including it in the pom, not 
actually distributing the code, so it's fine.  My mistake.

In NumDistinctValueEstimator.java:
# It would be good to either have a comment section at the head of the class 
that outlines the algorithm, or perhaps a link to somewhere that explains it.  
This will help future maintainers understand how this code works.
# Isn't generateHashForPCSA just generateHash with hashNum = 0?  Why repeat the 
code?

The only one I think really needs fixed before commit is the commenting on the 
algorithm.



> Use bit vector to track NDV
> ---------------------------
>
>                 Key: HIVE-12763
>                 URL: https://issues.apache.org/jira/browse/HIVE-12763
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Pengcheng Xiong
>            Assignee: Pengcheng Xiong
>         Attachments: HIVE-12763.01.patch, HIVE-12763.02.patch, 
> HIVE-12763.03.patch
>
>
> This will improve merging of per partitions stats. It will also help merge 
> NDV for auto-gather column stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to