if use CTAS then a MR job occures. Maybe the problem is in the MR job. 2015-09-15
xihuyu2000 发件人:Jason Dere <jd...@hortonworks.com> 发送时间:2015-09-15 06:00 主题:Re: binary column data consistency in hive table copy 收件人:"user@hive.apache.org"<user@hive.apache.org> 抄送: Looks like your table is using text storage format. Binary data needs to be stored as base64 in TextInputformat, so those values are probably being interpreted as base64 strings. From: Ujjwal Wadhawan <uwadha...@gmail.com> Sent: Monday, September 14, 2015 2:32 PM To: user@hive.apache.org Subject: binary column data consistency in hive table copy Hi all, I recently observed a behavior in hive that I’ll like to share and get inputs. Scenario: Say you have a hive table with a binary column. create table binsource (bincol binary); and some input data $ cat /nis3/home/ujjwal2/test2/binin 10000101 121 10 1011 Asfs Let’s load the data in the table LOAD DATA LOCAL INPATH '/home/ujjwal2/test2/binin' OVERWRITE INTO TABLE binsource; When I do a select * on hive CLI, I see following characters (see image) The underlying HDFS file still has the actual input though. Now I make a copy of this table using command "create table ujjwal2.bintarget as select * from ujjwal2.binsource;". ISSUE: Now when I see the underlying file create on HDFS for bintarget, I see some extra characters. In may combinations I have tried, the extra characters are in “=”, “w” and “A”. 10000101 120= 1w== 1011 Asfs Does anyone know what these characters signify ? Best, Ujjwal