[ https://issues.apache.org/jira/browse/HIVE-21626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820715#comment-16820715 ]
xubo245 commented on HIVE-21626: -------------------------------- [~amanin][~antmanin][~amanin] [~$iddhe$h] Please help to check it. > Why hive can't load normal string as binary from csv? > ----------------------------------------------------- > > Key: HIVE-21626 > URL: https://issues.apache.org/jira/browse/HIVE-21626 > Project: Hive > Issue Type: Bug > Environment: hive client: hive1.2.2 > Reporter: xubo245 > Priority: Major > > Why hive can't load normal string as binary from csv? > Hive-1.2.2 > {code:java} > hive> CREATE TABLE IF NOT EXISTS hivetable ( > > id int, > > label boolean, > > name string, > > image binary, > > autoLabel boolean) > > row format delimited fields terminated by 'ö'; > OK > Time taken: 0.068 seconds > hive> LOAD DATA LOCAL INPATH > '/Users/xubo/Desktop/xubo/git/carbondata3/integration/spark-common-test/src/test/resources/binarystringdata2.csv' > INTO TABLE hivetable; > Loading data to table default.hivetable > Table default.hivetable stats: ÄnumFiles=1, totalSize=82Å > OK > Time taken: 0.122 seconds > hive> select * from hivetable; > OK > 2 false 2.png i� true > 3 false 3.png n*%� > false > 1 true 1.png ÜAyard dutyÜB true > {code} > binarystringdata2.csv data is: > {code:java} > ``` > 2|false|2.png|abc|true > 3|false|3.png|biology|false > 1|true|1.png|^Ayard duty^B|true > {code} > binarystringdata2.csv without \u0001 like over1k of hive project. > For the "abc" in csv, it should return abc by reading from hive after loading > into hive, but why it is "I�"?. abc get bytes is byte[] 97 98 99, after > org.apache.hadoop.hive.serde2.lazy.LazyBinary#decodeIfNeeded, it will decode > to base64, return byte[] 105 -74: > {code:java} > public static byte[] decodeIfNeeded(byte[] recv) { > boolean arrayByteBase64 = Base64.isArrayByteBase64(recv); > if (LOG.isDebugEnabled() && arrayByteBase64) { > LOG.debug("Data only contains Base64 alphabets only so try to decode > the data."); > } > return arrayByteBase64 ? Base64.decodeBase64(recv) : recv; > } > {code} > when we query with sql in spark, it will return byte[] 69 B7, for the hive > alien/beeline, it will return string "I�"( char array is 105 65533). > Why the input and output data is different for hive load data ? insert into > is ok. > Is it bug or limit ? only support base64 code or string that was validated > with isBase64 as false in csv? -- This message was sent by Atlassian JIRA (v7.6.3#76005)