Hello Hive Experts, I use flume to ingest application specific logs from Syslog to HDFS. Currently, I grep the HDFS directory for specific patterns (for multiple types of requests) and then create reports. However, generating reports for Weekly and Monthly are not salable.
I would like to create multiple external on the daily HDFS directory partitioned by date with RegexSerde and then create separate Parquet tables for every kind of request. Question is - How do I create multiple (about 20) RegexSerde tables on same data applying filters? This will be just like 20 grep commands I am running today. Example: hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'STORE Request Received for APPXXXX' | awk '{print $4, $13, $14, $17, $20}' hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'SCAN Request Received for APPYYYY' | awk '{print $4, $14, $19, $21, $22}' hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'TOTAL TIME' | awk '{print $4, $24}' I would like to create a tables which does this kind of job and then writes output to Parquet tables. Please let me know how this can be done. Thank you! Regards, Arun