On Wed, Mar 30, 2011 at 3:46 PM, Michael Jiang <it.mjji...@gmail.com> wrote:
> Also what if I want just one step to load each log entry line from log file
> and for each generate multiple lines? That is, just one table created. I
> don't want to have one table and then call explode() to get multiple lines.
> Otherwise, alternative way is to use streaming on loaded table to turn it
> into another one with no need to customize a serde. So, yeah, the goal here
> is to see how a serde can do this stuff.
>
> Thanks!
>
> On Wed, Mar 30, 2011 at 12:03 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>>
>> On Wed, Mar 30, 2011 at 2:55 PM, Michael Jiang <it.mjji...@gmail.com>
>> wrote:
>> > Want to extend RegexSerDe to parse apache web log: for each log entry,
>> > need
>> > to convert it into multiple entries. This is easy in streaming. But new
>> > to
>> > serde, wondering if it is doable and how? Thanks!
>> >
>>
>> You can have your serde produce list<struct> and then explode() them.
>
>

The role of SerDe is to take the output from the InputFormat and use
the information inside the metastore to decode it. As a result this is
not a good fit for a spot to turn a single row into multiple rows.

What I am suggesting is define a column like this

create table ...( id int, list<String> log_entries) RowFormat serde....

Make sure your serde decodes and populates log_entires.

>From there you can use lateral view and explode
http://wiki.apache.org/hadoop/Hive/LanguageManual/LateralView to turn
the list<String> into rows.


Edward

Reply via email to