Well, if there is no known solution, maybe extend regexserde ...

On Tue, Mar 29, 2011 at 5:36 PM, Michael Jiang <it.mjji...@gmail.com> wrote:

> hey guys,
>
> I want to extract some information from an apache web log. It does more
> than just extracting fixed fields that appear at certain location such as
> host and request. One task is to extract multiple key/value pairs in request
> string. For example, in request string, I have parameters like "name.0",
> "name.1", ..., "name.n". Here "n" can be any valid non-negative integer.
> They may appear anywhere in the request. It's not just to extract each
> key/value pair. More than that :) I want to clone the entry line "n" times
> if it contains "name.i" n times, each "ith" cloned entry has an extra field
> with the value of "name.i".
>
> I can load log and extract request string first into a table. Then write a
> script to do streaming to extract "name" key/value and write to stdout "n"
> cloned entries. But is there a one step solution to extract them all from
> log file and generate multiple entries as well? I know
> "org.apache.hadoop.hive.contrib.serde2.RegexSerDe" can load and extract
> apache web log. Is it possible to use it for this case? Thanks!
>
> --mj
>

Reply via email to