One more question. I have everything working except a Map<String,String>.
I understand that the whole Map will be physically stored as a single Text object in the RCFile. I have had considerable trouble setting up the delimiters for this Map. I want to have MAP KEYS TERMINATED BY '=' COLLECTION ITEMS TERMINATED BY '&' Hive doesn't seem to want to take that. I have also tried using the ascii OCT codes. What do I need to setup to make this Map work? Thanks. Steve -----Original Message----- From: yongqiang he [mailto:heyongqiang...@gmail.com] Sent: Thursday, March 17, 2011 5:09 PM To: user@hive.apache.org Subject: Re: Building Custom RCFiles Yes. It is the same with normal hive tables. thanks yongqiang On Thu, Mar 17, 2011 at 4:54 PM, Severance, Steve <ssevera...@ebay.com> wrote: > Thanks Yongqiang. > > So for more complex types like map do I just setup a > > ROW FORMAT DELIMITED KEYS TERMINATED BY '|' etc... > > Thanks. > > Steve > > -----Original Message----- > From: yongqiang he [mailto:heyongqiang...@gmail.com] > Sent: Thursday, March 17, 2011 4:35 PM > To: user@hive.apache.org > Subject: Re: Building Custom RCFiles > > A side note, in hive, we make all columns saved as Text internally > (even the column's type is int or double etc). And with some > experiments, string is more friendly to compression. But it needs CPU > to decode to its original type. > > Thanks > Yongqiang > On Thu, Mar 17, 2011 at 4:04 PM, yongqiang he <heyongqiang...@gmail.com> > wrote: >> You need to customize Hive's ColumnarSerde (maybe functions in >> LazySerde)'s serde and deserialize function (depends you want to read >> or write.). And the main thing is that you need to use your own type >> def (not LazyInt/LazyLong). >> >> If your type is int or long (not double/float), casting it to string >> only wastes some CPU, but can save you more spaces. >> >> Thanks >> Yongqiang >> On Thu, Mar 17, 2011 at 3:48 PM, Severance, Steve <ssevera...@ebay.com> >> wrote: >>> Hi, >>> >>> >>> >>> I am working on building a MR job that generates RCFiles that will become >>> partitions of a hive table. I have most of it working however only strings >>> (Text) are being deserialized inside of Hive. The hive table is specified to >>> use a columnarserde which I thought should allow the writable types stored >>> in the RCFile to be deserialized properly. >>> >>> >>> >>> Currently all numeric types (IntWritable and LongWritable) come back a null. >>> >>> >>> >>> Has anyone else seen anything like this or have any ideas? I would rather >>> not convert all my data to strings to use RCFile. >>> >>> >>> >>> Thanks. >>> >>> >>> >>> Steve >> >