Why hive can't load normal string as binary from csv?
https://issues.apache.org/jira/browse/HIVE-21626
Hive-1.2.2
hive> CREATE TABLE IF NOT EXISTS hivetable ( > id int, > label
boolean, > name string, > image binary, > autoLabel
boolean) > row format delimited fields terminated by 'ö'; OK Time taken:
0.068 seconds hive> LOAD DATA LOCAL INPATH
'/Users/xubo/Desktop/xubo/git/carbondata3/integration/spark-common-test/src/test/resources/binarystringdata2.csv'
INTO TABLE hivetable; Loading data to table default.hivetable Table
default.hivetable stats: ÄnumFiles=1, totalSize=82Å OK Time taken: 0.122
seconds hive> select * from hivetable; OK 2 false 2.png i� true 3
false 3.png n*%� false 1 true 1.png
ÜAyard dutyÜB true
binarystringdata2.csv data is:
``` 2|false|2.png|abc|true 3|false|3.png|biology|false 1|true|1.png|^Ayard
duty^B|true
binarystringdata2.csv without \u0001 like over1k of hive project.
For the "abc" in csv, it should return abc by reading from hive after loading
into hive, but why it is "I�"?. abc get bytes is byte[] 97 98 99, after
org.apache.hadoop.hive.serde2.lazy.LazyBinary#decodeIfNeeded, it will decode to
base64, return byte[] 105 -74:
public static byte[] decodeIfNeeded(byte[] recv) { boolean
arrayByteBase64 = Base64.isArrayByteBase64(recv); if (LOG.isDebugEnabled()
&& arrayByteBase64) { LOG.debug("Data only contains Base64 alphabets only
so try to decode the data."); } return arrayByteBase64 ?
Base64.decodeBase64(recv) : recv; }
when we query with sql in spark, it will return byte[] 69 B7, for the hive
alien/beeline, it will return string "I�"( char array is 105 65533).
Why the input and output data is different for hive load data ? insert into is
ok.
Is it bug or limit ? only support base64 code or string that was validated with
isBase64 as false in csv?