if use CTAS then a MR job occures.  Maybe the problem is in the MR job.
2015-09-15 

xihuyu2000 



发件人:Jason Dere <jd...@hortonworks.com>
发送时间:2015-09-15 06:00
主题:Re: binary column data consistency in hive table copy
收件人:"user@hive.apache.org"<user@hive.apache.org>
抄送:

Looks like your table is using text storage format. Binary data needs to be 
stored as base64 in TextInputformat, so those values are probably being 
interpreted as base64 strings.






From: Ujjwal Wadhawan <uwadha...@gmail.com>
Sent: Monday, September 14, 2015 2:32 PM
To: user@hive.apache.org
Subject: binary column data consistency in hive table copy 

Hi all,


I recently observed a behavior in hive that I’ll like to share and get inputs.

Scenario:

Say you have a hive table with a binary column.

create table binsource (bincol binary);

and some input data

$ cat /nis3/home/ujjwal2/test2/binin
10000101
121
10
1011
Asfs


Let’s load the data in the table

LOAD DATA LOCAL INPATH '/home/ujjwal2/test2/binin' OVERWRITE INTO TABLE 
binsource;

When I do a select * on hive CLI, I see following characters (see image)





The underlying HDFS file still has the actual input though.



Now I make a copy of this table using command "create table ujjwal2.bintarget 
as select * from ujjwal2.binsource;".





ISSUE:


Now when I see the underlying file create on HDFS for bintarget, I see some 
extra characters.



In may combinations I have tried, the extra characters are in “=”, “w” and “A”. 


10000101
120=
1w==
1011
Asfs


Does anyone know what these characters signify ?

Best,
Ujjwal

Reply via email to