hey guys, I want to extract some information from an apache web log. It does more than just extracting fixed fields that appear at certain location such as host and request. One task is to extract multiple key/value pairs in request string. For example, in request string, I have parameters like "name.0", "name.1", ..., "name.n". Here "n" can be any valid non-negative integer. They may appear anywhere in the request. It's not just to extract each key/value pair. More than that :) I want to clone the entry line "n" times if it contains "name.i" n times, each "ith" cloned entry has an extra field with the value of "name.i".
I can load log and extract request string first into a table. Then write a script to do streaming to extract "name" key/value and write to stdout "n" cloned entries. But is there a one step solution to extract them all from log file and generate multiple entries as well? I know "org.apache.hadoop.hive.contrib.serde2.RegexSerDe" can load and extract apache web log. Is it possible to use it for this case? Thanks! --mj