RE: RegexSerDe with Filters

Markovitz, Dudu Tue, 21 Jun 2016 04:26:22 -0700

Hi

I would suggest creating a single external table with daily partitions and 
multiple views each with the appropriate filtering.
If you’ll send me log sample (~100 rows) I’ll send you an example.

Dudu

From: Arun Patel [mailto:arunp.bigd...@gmail.com]
Sent: Tuesday, June 21, 2016 1:51 AM
To: user@hive.apache.org
Subject: RegexSerDe with Filters

Hello Hive Experts,

I use flume to ingest application specific logs from Syslog to HDFS.  
Currently, I grep the HDFS directory for specific patterns (for multiple types 
of requests) and then create reports.  However, generating reports for Weekly 
and Monthly are not salable.

I would like to create multiple external on the daily HDFS directory 
partitioned by date with RegexSerde and then create separate Parquet tables for 
every kind of request.

Question is - How do I create multiple (about 20) RegexSerde tables on same 
data applying filters?  This will be just like 20 grep commands I am running 
today.

Example:  hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'STORE Request 
Received for APPXXXX' | awk '{print $4, $13, $14, $17, $20}'
                hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'SCAN Request 
Received for APPYYYY' | awk '{print $4, $14, $19, $21, $22}'
                hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'TOTAL TIME' 
| awk '{print $4, $24}'

I would like to create a tables which does this kind of job and then writes 
output to Parquet tables.

Please let me know how this can be done.  Thank you!

Regards,
Arun

RE: RegexSerDe with Filters

Reply via email to