RegexSerDe with Filters

Arun Patel Mon, 20 Jun 2016 15:52:05 -0700

Hello Hive Experts,

I use flume to ingest application specific logs from Syslog to HDFS.
Currently, I grep the HDFS directory for specific patterns (for multiple
types of requests) and then create reports.  However, generating reports
for Weekly and Monthly are not salable.


I would like to create multiple external on the daily HDFS directory
partitioned by date with RegexSerde and then create separate Parquet tables
for every kind of request.

Question is - How do I create multiple (about 20) RegexSerde tables on same
data applying filters?  This will be just like 20 grep commands I am
running today.

Example:  hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'STORE Request
Received for APPXXXX' | awk '{print $4, $13, $14, $17, $20}'
                hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'SCAN
Request Received for APPYYYY' | awk '{print $4, $14, $19, $21, $22}'
                hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'TOTAL
TIME' | awk '{print $4, $24}'

I would like to create a tables which does this kind of job and then writes
output to Parquet tables.

Please let me know how this can be done.  Thank you!

Regards,
Arun

RegexSerDe with Filters

Reply via email to