We do a similar process with our log files in Hive. We only handle 30 to 60
files (similar structure) at a time, but it sounds like it would fit your
model…..
We create an external table, then do hdfs puts to add the files to the table:
CREATE EXTERNAL TABLE log_import(
date STRING,
time ST
If you can create table having schema similar to your files' structure. and
later add files as partition into the table-
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable%2FPartitionStatements
then you can query your files using where clause.
This s
I believe Hive does not have any feature, which can provide this
information. You may like to write a custom Map / Reduce program and get
the file name being processed as shown below-
((FileSplit) context.getInputSplit()).getPath()
and then emit the file name when an occurrence of the word is fou