LazySimpleSerDe does not properly handle arrays / escape control characters
---------------------------------------------------------------------------
Key: HIVE-2333
URL: https://issues.apache.org/jira/browse/HIVE-2333
Project: Hive
Issue Type: Bug
Reporter: Jonathan Chang
LazySimpleSerDe, the default SerDe for Hive is severely broken:
* Empty arrays are serialized as an empty string. Hence an array(array()) is
indistinguishable from array(array(array())) from array().
* Similarly, empty strings are serialized as an empty string. Hence array('')
is also indistinguishable from an empty array.
* if the serialized string equals the null sequence, then it is ambiguous as to
whether it is an array with a single null element or a null array.
It also does not do well with control characters:
> select array('foo\002bar') from tmp;
...
["foo","bar"]
> select array('foo\001bar') from tmp;
...
["foo"]
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira