I have the data in s3 bucket in the following manner |s3://logs/ad1date1.log.gz s3://logs/ad1date2.log.gz s3://logs/ad1date3.log.gz s3://logs/ad1date4.log.gz s3://logs/ad2date1.log.gz s3://logs/ad2date2.log.gz s3://logs/ad2date3.log.gz s3://logs/ad2date4.log.gz |
I have to load some of them into a single hive table for which I am using the following query
|CREATE EXTERNAL TABLE analyze_files_tab (cookie STRING, d2 STRING, url STRING, d4 STRING, d5 STRING, d6 STRING, adv_id_dummy STRING, timestp STRING, ip STRING, userAgent STRING, stage STRING, d12 STRING, d13 STRING) PARTITION BY (adv_id,date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://logs/joined_analyze_files_hive/' | How should I add the data into files? will |ALTER TABLE raw_logs RECOVER PARTITIONS;| do the trick? Don't I need to map which file maps to which adv_id,date combination? Also a pointer to good tutorial for beginner would be helpful.