Yes. tentatively that is what I have to do. another way is to convert the data to base64 encoded string. after client received the data, it needs to decode the data back to binary. this is a hack, but works.

If hive supports byte array as native data type, then the solution is much more elegant.

Jimmy.

--------------------------------------------------
From: "Ted Yu" <yuzhih...@gmail.com>
Sent: Tuesday, October 12, 2010 4:33 PM
To: <dev@hive.apache.org>
Subject: Re: blob handling in hive

How about utf-8 encode your blob and store in Hive as String ?

On Tue, Oct 12, 2010 at 4:20 PM, Jinsong Hu <jinsong...@hotmail.com> wrote:

I thought about that too. but then I need to write an bytes inspector and
stick that into hive inspector factory.  we also need to create a new
datatype , such as blob , in hive's supported
data types. Adding a new supported data type to hive is a non-trivial task,
as more code will need to be touched.

I am just wondering if it is possible to get what I want to do without such
big change.



Jimmy.

--------------------------------------------------
From: "Ted Yu" <yuzhih...@gmail.com>
Sent: Tuesday, October 12, 2010 4:12 PM

To: <dev@hive.apache.org>
Subject: Re: blob handling in hive

 How about creating org.apache.hadoop.hive.serde2.io.BytesWritable which
wraps byte[] ?

On Tue, Oct 12, 2010 at 3:49 PM, Jinsong Hu <jinsong...@hotmail.com>
wrote:

 storing the blob in hbase is too costly. hbase compaction costs lots of
cpu. All I want to do is to be able to read the byte array out of a
sequence
file, and map that byte array to an hive column.
I can write a SerDe for this purpose.

I tried to define the data to be array<tinyint>. I then tried to write
custom  SerDe, after  I get the byte array out  of the disk, I need to
map
it,

 so I wrote the code:
columnTypes
=TypeInfoUtils.getTypeInfosFromTypeString("int,string,array<tinyint>");

but then how to I convert the data in the row.set() method ?

I tried this:

     byte [] bContent=ev.get_content()==null ? null :
(ev.get_content().getData()==null ? null : ev.get_content().getData());
     org.apache.hadoop.hive.serde2.io.ByteWritable tContent =
bContent==null ? new org.apache.hadoop.hive.serde2.io.ByteWritable() :
new
org.apache.hadoop.hive.serde2.io.ByteWritable(bContent[0]) ;
      row.set(2, tContent);

this works for a single byte, but doesn't work for byte array.
Any way that I can get the byte array returned in sql is appreciated.

Jimmy

--------------------------------------------------
From: "Ted Yu" <yuzhih...@gmail.com>
Sent: Tuesday, October 12, 2010 2:19 PM
To: <dev@hive.apache.org>
Subject: Re: blob handling in hive


 One way is to store blob in HBase and use HBaseHandler to access your

blob.

On Tue, Oct 12, 2010 at 2:14 PM, Jinsong Hu <jinsong...@hotmail.com>
wrote:

 Hi,

 I am using sqoop to export data from mysql to hive. I noticed that
hive
don't have blob data type yet. is there anyway I can do so hive can
store
blob ?

Jimmy






Reply via email to