On Wed, Mar 30, 2011 at 3:46 PM, Michael Jiang <it.mjji...@gmail.com> wrote: > Also what if I want just one step to load each log entry line from log file > and for each generate multiple lines? That is, just one table created. I > don't want to have one table and then call explode() to get multiple lines. > Otherwise, alternative way is to use streaming on loaded table to turn it > into another one with no need to customize a serde. So, yeah, the goal here > is to see how a serde can do this stuff. > > Thanks! > > On Wed, Mar 30, 2011 at 12:03 PM, Edward Capriolo <edlinuxg...@gmail.com> > wrote: >> >> On Wed, Mar 30, 2011 at 2:55 PM, Michael Jiang <it.mjji...@gmail.com> >> wrote: >> > Want to extend RegexSerDe to parse apache web log: for each log entry, >> > need >> > to convert it into multiple entries. This is easy in streaming. But new >> > to >> > serde, wondering if it is doable and how? Thanks! >> > >> >> You can have your serde produce list<struct> and then explode() them. > >
The role of SerDe is to take the output from the InputFormat and use the information inside the metastore to decode it. As a result this is not a good fit for a spot to turn a single row into multiple rows. What I am suggesting is define a column like this create table ...( id int, list<String> log_entries) RowFormat serde.... Make sure your serde decodes and populates log_entires. >From there you can use lateral view and explode http://wiki.apache.org/hadoop/Hive/LanguageManual/LateralView to turn the list<String> into rows. Edward