I tried recreating your exact dump with a more recent(built as of about 3 weeks back) hive. And in addition to the base64-decoded version of the binary data, I get some extraneous characters in every line of the select *. (consistently the same extra characters)
For eg, an od -c of the first line of this table goes: N U L L \t a b c \t 357 277 275 M \n The correct base64-decode of "001" is just "M". Saving this to another equivalent table, with a CTAS (create table as select) yields a similar encoding to the original file for the last two lines, and an extra "=" at the end for each line before. That encoding, in turn, seems stable, if I CTAS from that table to another. All 3 yield the same output when I do select *. I get the same output from select * even when I CTAS to an rcfile. The problem might be with the LazySimpleSerDe binary decode, but if so, it is so with the encode as well. Or, the problem might be with how binary data is output using select *. Either way, this merits creating a jira to address. On Wed, Sep 4, 2013 at 2:35 AM, Arun Vasu <arun...@gmail.com> wrote: > Hi, > I am using Hive 10. When I create an external table with column type as > Binary, the query result on the table is showing some junk values for the > column with binary datatype. > > Please find below the query I have used to create the table: > > CREATE EXTERNAL TABLE BOOL1(NB BOOLEAN,email STRING, bitfld BINARY) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '^' > LINES TERMINATED BY '\n' > STORED AS TEXTFILE > LOCATION '/user/hivetables/testbinary'; > > The query I have used is : select * from bool1 > > The sample data in the hdfs file is: > > 0^a...@abc.com^001 > 1^a...@abc.com^010 > ^a...@abc.com^011 > ^a...@abc.com^100 > t^a...@abc.com^101 > f^a...@abc.com^110 > true^a...@abc.com^111 > false^a...@abc.com^001 > 123^ ^01100010 > 12344^ ^01100001 > > Please share your inputs if it is possible. > > Thanks, > Arun > > -- > Thanks, > Arun