Why hive can't load normal string as binary from csv? 
https://issues.apache.org/jira/browse/HIVE-21626
Hive-1.2.2
hive>  CREATE TABLE IF NOT EXISTS hivetable (     >     id int,     >     label 
boolean,     >     name string,     >     image binary,     >     autoLabel 
boolean)     >  row format delimited fields terminated by 'ö'; OK Time taken: 
0.068 seconds hive> LOAD DATA LOCAL INPATH 
'/Users/xubo/Desktop/xubo/git/carbondata3/integration/spark-common-test/src/test/resources/binarystringdata2.csv'
 INTO TABLE hivetable; Loading data to table default.hivetable Table 
default.hivetable stats: ÄnumFiles=1, totalSize=82Å OK Time taken: 0.122 
seconds hive> select * from hivetable; OK 2 false   2.png   i�      true 3  
false   3.png   n*%�                                    false 1 true    1.png   
ÜAyard dutyÜB   true 


binarystringdata2.csv data is:
``` 2|false|2.png|abc|true 3|false|3.png|biology|false 1|true|1.png|^Ayard 
duty^B|true 


binarystringdata2.csv without \u0001 like over1k of hive project.

For the "abc" in csv, it should return abc by reading from hive after loading 
into hive, but why it is "I�"?. abc get bytes is byte[] 97 98 99, after 
org.apache.hadoop.hive.serde2.lazy.LazyBinary#decodeIfNeeded, it will decode to 
base64, return byte[] 105 -74:
  public static byte[] decodeIfNeeded(byte[] recv) {     boolean 
arrayByteBase64 = Base64.isArrayByteBase64(recv);     if (LOG.isDebugEnabled() 
&& arrayByteBase64) {       LOG.debug("Data only contains Base64 alphabets only 
so try to decode the data.");     }     return arrayByteBase64 ? 
Base64.decodeBase64(recv) : recv;   } 


when we query with sql in spark, it will return byte[] 69 B7, for the hive 
alien/beeline, it will return string "I�"( char array is 105 65533).

Why the input and output data is different for hive load data ? insert into is 
ok.

Is it bug or limit ? only support base64 code or string that was validated with 
isBase64 as false in csv?

Reply via email to