A side note, in hive, we make all columns saved as Text internally
(even the column's type is int or double etc). And with some
experiments, string is more friendly to compression. But it needs CPU
to decode to its original type.

Thanks
Yongqiang
On Thu, Mar 17, 2011 at 4:04 PM, yongqiang he <heyongqiang...@gmail.com> wrote:
> You need to customize Hive's ColumnarSerde (maybe functions in
> LazySerde)'s serde and deserialize function (depends you want to read
> or write.). And the main thing is that you need to use your own type
> def (not LazyInt/LazyLong).
>
> If your type is int or long (not double/float), casting it to string
> only wastes some CPU, but can save you more spaces.
>
> Thanks
> Yongqiang
> On Thu, Mar 17, 2011 at 3:48 PM, Severance, Steve <ssevera...@ebay.com> wrote:
>> Hi,
>>
>>
>>
>> I am working on building a MR job that generates RCFiles that will become
>> partitions of a hive table. I have most of it working however only strings
>> (Text) are being deserialized inside of Hive. The hive table is specified to
>> use a columnarserde which I thought should allow the writable types stored
>> in the RCFile to be deserialized properly.
>>
>>
>>
>> Currently all numeric types (IntWritable and LongWritable) come back a null.
>>
>>
>>
>> Has anyone else seen anything like this or have any ideas? I would rather
>> not convert all my data to strings to use RCFile.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> Steve
>

Reply via email to