[ https://issues.apache.org/jira/browse/HIVE-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701158#comment-13701158 ]
Jonathan Chang commented on HIVE-2333: -------------------------------------- The key difference to protobuf is that the serdes should satisfy the Hive contract. I'm not sure what the best way to express the contract is but certainly if a serde does not support a certain condition, at the very least, a warning needs to be shown. > LazySimpleSerDe does not properly handle arrays / escape control characters > --------------------------------------------------------------------------- > > Key: HIVE-2333 > URL: https://issues.apache.org/jira/browse/HIVE-2333 > Project: Hive > Issue Type: Bug > Reporter: Jonathan Chang > Priority: Critical > > LazySimpleSerDe, the default SerDe for Hive is severely broken: > * Empty arrays are serialized as an empty string. Hence an array(array()) is > indistinguishable from array(array(array())) from array(). > * Similarly, empty strings are serialized as an empty string. Hence array('') > is also indistinguishable from an empty array. > * if the serialized string equals the null sequence, then it is ambiguous as > to whether it is an array with a single null element or a null array. > It also does not do well with control characters: > > select array('foo\002bar') from tmp; > ... > ["foo","bar"] > > select array('foo\001bar') from tmp; > ... > ["foo"] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira