[ 
https://issues.apache.org/jira/browse/HIVE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046362#comment-13046362
 ] 

Krishna Kumar commented on HIVE-956:
------------------------------------

bq. can warnedOnceNullMapKey be removed?

It is easy to remove warnedOnceNullMapKey
 - if it is ok to log a warning message every time a null map key is encountered
 - if it is ok to log a warning message only once per process execution (by 
making it a class static)

The current behavior is to log a warning message once per instance of 
LazyBinarySerde. If we want to retain the same behavior, it should either be a 
parameter? (or more complicated mechanisms as callback/thread-local)

bq. A 0 should mean an empty string. '\N' means null in Hive. Can you take a 
look at how LazyBinarySerde handles null value, and do the same thing here.

Not sure I understand. The serde is free to implement the mechanism to encode 
null/empty values anyway it sees fit? '\N' means null only in the context of 
specific serde - for instance columnar serde. Lazybinaryserde uses a null byte 
for every 8 fields to encode nulls, (and a string length as part of the data 
for encoding empty strings). IMO, neither of these options is best suited for 
lazybinarycolumnar, the former as it means escaping complexities, and the 
latter as the storage is now by columns, not by rows. I have taken the approach 
that a 0-length column cell value indicates nulls (nulls being a very common 
case, should have minimal overheads.). For empty strings, while the option to 
encode string length as part of the cell value is still an option, I think that 
is too much overhead (as shown in my tests for the same specific dataset) for 
the non-empty cells. 

The implementation is fine, I think. It first checks whether the field is a 
primitive (for non-primitives, input byte stream length is also the data 
length), and then on the field is a string of length 1 with the value being the 
special marker etc.

will do the mapequalcomparer splitting.


> Add support of columnar binary serde
> ------------------------------------
>
>                 Key: HIVE-956
>                 URL: https://issues.apache.org/jira/browse/HIVE-956
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>            Assignee: Krishna Kumar
>         Attachments: HIVE.956.patch.0, HIVE.956.patch.1, HIVE.956.patch.2
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to