One more question. I have everything working except a Map<String,String>.

I understand that the whole Map will be physically stored as a single Text 
object in the RCFile.

I have had considerable trouble setting up the delimiters for this Map.

I want to have
        MAP KEYS TERMINATED BY '='
        COLLECTION ITEMS TERMINATED BY '&'

Hive doesn't seem to want to take that. I have also tried using the ascii OCT 
codes.

What do I need to setup to make this Map work?

Thanks.

Steve 

-----Original Message-----
From: yongqiang he [mailto:heyongqiang...@gmail.com] 
Sent: Thursday, March 17, 2011 5:09 PM
To: user@hive.apache.org
Subject: Re: Building Custom RCFiles

Yes. It is the same with normal hive tables.

thanks
yongqiang
On Thu, Mar 17, 2011 at 4:54 PM, Severance, Steve <ssevera...@ebay.com> wrote:
> Thanks Yongqiang.
>
> So for more complex types like map do I just setup a
>
> ROW FORMAT DELIMITED KEYS TERMINATED BY '|' etc...
>
> Thanks.
>
> Steve
>
> -----Original Message-----
> From: yongqiang he [mailto:heyongqiang...@gmail.com]
> Sent: Thursday, March 17, 2011 4:35 PM
> To: user@hive.apache.org
> Subject: Re: Building Custom RCFiles
>
> A side note, in hive, we make all columns saved as Text internally
> (even the column's type is int or double etc). And with some
> experiments, string is more friendly to compression. But it needs CPU
> to decode to its original type.
>
> Thanks
> Yongqiang
> On Thu, Mar 17, 2011 at 4:04 PM, yongqiang he <heyongqiang...@gmail.com> 
> wrote:
>> You need to customize Hive's ColumnarSerde (maybe functions in
>> LazySerde)'s serde and deserialize function (depends you want to read
>> or write.). And the main thing is that you need to use your own type
>> def (not LazyInt/LazyLong).
>>
>> If your type is int or long (not double/float), casting it to string
>> only wastes some CPU, but can save you more spaces.
>>
>> Thanks
>> Yongqiang
>> On Thu, Mar 17, 2011 at 3:48 PM, Severance, Steve <ssevera...@ebay.com> 
>> wrote:
>>> Hi,
>>>
>>>
>>>
>>> I am working on building a MR job that generates RCFiles that will become
>>> partitions of a hive table. I have most of it working however only strings
>>> (Text) are being deserialized inside of Hive. The hive table is specified to
>>> use a columnarserde which I thought should allow the writable types stored
>>> in the RCFile to be deserialized properly.
>>>
>>>
>>>
>>> Currently all numeric types (IntWritable and LongWritable) come back a null.
>>>
>>>
>>>
>>> Has anyone else seen anything like this or have any ideas? I would rather
>>> not convert all my data to strings to use RCFile.
>>>
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>> Steve
>>
>

Reply via email to