[ https://issues.apache.org/jira/browse/HIVE-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116169#comment-15116169 ]
Alan Gates commented on HIVE-12763: ----------------------------------- bq. Alan>> In hbase_metastore_proto.proto, I'm surprised to see that you are storing the bit vectors as strings. Why not as bytes? bq. Pengcheng>> I store bit vector as strings because the default serialization and de-serialization is Text (or String) in Hive Ok, I'm wondering if your de/serialization could be more efficient if you stored it as a binary value rather than text. But maybe it's not a big enough deal to worry about. On the NOTICE file, I'm wrong. You're just including it in the pom, not actually distributing the code, so it's fine. My mistake. In NumDistinctValueEstimator.java: # It would be good to either have a comment section at the head of the class that outlines the algorithm, or perhaps a link to somewhere that explains it. This will help future maintainers understand how this code works. # Isn't generateHashForPCSA just generateHash with hashNum = 0? Why repeat the code? The only one I think really needs fixed before commit is the commenting on the algorithm. > Use bit vector to track NDV > --------------------------- > > Key: HIVE-12763 > URL: https://issues.apache.org/jira/browse/HIVE-12763 > Project: Hive > Issue Type: Improvement > Reporter: Pengcheng Xiong > Assignee: Pengcheng Xiong > Attachments: HIVE-12763.01.patch, HIVE-12763.02.patch, > HIVE-12763.03.patch > > > This will improve merging of per partitions stats. It will also help merge > NDV for auto-gather column stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)