Hi

I would suggest creating a single external table with daily partitions and 
multiple views each with the appropriate filtering.
If you’ll send me log sample (~100 rows) I’ll send you an example.

Dudu

From: Arun Patel [mailto:arunp.bigd...@gmail.com]
Sent: Tuesday, June 21, 2016 1:51 AM
To: user@hive.apache.org
Subject: RegexSerDe with Filters

Hello Hive Experts,

I use flume to ingest application specific logs from Syslog to HDFS.  
Currently, I grep the HDFS directory for specific patterns (for multiple types 
of requests) and then create reports.  However, generating reports for Weekly 
and Monthly are not salable.

I would like to create multiple external on the daily HDFS directory 
partitioned by date with RegexSerde and then create separate Parquet tables for 
every kind of request.

Question is - How do I create multiple (about 20) RegexSerde tables on same 
data applying filters?  This will be just like 20 grep commands I am running 
today.

Example:  hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'STORE Request 
Received for APPXXXX' | awk '{print $4, $13, $14, $17, $20}'
                hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'SCAN Request 
Received for APPYYYY' | awk '{print $4, $14, $19, $21, $22}'
                hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'TOTAL TIME' 
| awk '{print $4, $24}'

I would like to create a tables which does this kind of job and then writes 
output to Parquet tables.

Please let me know how this can be done.  Thank you!

Regards,
Arun

Reply via email to