[ https://issues.apache.org/jira/browse/HIVE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046362#comment-13046362 ]
Krishna Kumar commented on HIVE-956: ------------------------------------ bq. can warnedOnceNullMapKey be removed? It is easy to remove warnedOnceNullMapKey - if it is ok to log a warning message every time a null map key is encountered - if it is ok to log a warning message only once per process execution (by making it a class static) The current behavior is to log a warning message once per instance of LazyBinarySerde. If we want to retain the same behavior, it should either be a parameter? (or more complicated mechanisms as callback/thread-local) bq. A 0 should mean an empty string. '\N' means null in Hive. Can you take a look at how LazyBinarySerde handles null value, and do the same thing here. Not sure I understand. The serde is free to implement the mechanism to encode null/empty values anyway it sees fit? '\N' means null only in the context of specific serde - for instance columnar serde. Lazybinaryserde uses a null byte for every 8 fields to encode nulls, (and a string length as part of the data for encoding empty strings). IMO, neither of these options is best suited for lazybinarycolumnar, the former as it means escaping complexities, and the latter as the storage is now by columns, not by rows. I have taken the approach that a 0-length column cell value indicates nulls (nulls being a very common case, should have minimal overheads.). For empty strings, while the option to encode string length as part of the cell value is still an option, I think that is too much overhead (as shown in my tests for the same specific dataset) for the non-empty cells. The implementation is fine, I think. It first checks whether the field is a primitive (for non-primitives, input byte stream length is also the data length), and then on the field is a string of length 1 with the value being the special marker etc. will do the mapequalcomparer splitting. > Add support of columnar binary serde > ------------------------------------ > > Key: HIVE-956 > URL: https://issues.apache.org/jira/browse/HIVE-956 > Project: Hive > Issue Type: New Feature > Reporter: He Yongqiang > Assignee: Krishna Kumar > Attachments: HIVE.956.patch.0, HIVE.956.patch.1, HIVE.956.patch.2 > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira