[ 
https://issues.apache.org/jira/browse/HIVE-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700215#comment-13700215
 ] 

Edward Capriolo commented on HIVE-2333:
---------------------------------------

Interesting. I am not sure what the semantics should be. Protobuf for example 
does not support null arrays and null arrays are empty. What serde' support 
what complex types is an interesting question I do not know the answerto. It 
would be great to have a table of Lazy, thrift, avro, ocr, rcfile and determine 
exactly what is supported by each.
                
> LazySimpleSerDe does not properly handle arrays / escape control characters
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-2333
>                 URL: https://issues.apache.org/jira/browse/HIVE-2333
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Jonathan Chang
>            Priority: Critical
>
> LazySimpleSerDe, the default SerDe for Hive is severely broken:
> * Empty arrays are serialized as an empty string. Hence an array(array()) is 
> indistinguishable from array(array(array())) from array().
> * Similarly, empty strings are serialized as an empty string. Hence array('') 
> is also indistinguishable from an empty array.
> * if the serialized string equals the null sequence, then it is ambiguous as 
> to whether it is an array with a single null element or a null array.
> It also does not do well with control characters:
> > select array('foo\002bar') from tmp;
> ...
> ["foo","bar"]
> > select array('foo\001bar') from tmp;
> ...
> ["foo"]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to