Hi, Sorry this is an AWS Hive Specific question. I have two External Hive tables for my custom logs.
1. flat directory structure on AWS S3, no partition and files in bz2 compressed format (few big files) 2. With 3 level of partitions on AWS S3 (lot of small uncompressed files) I noticed that my queries on the table with Partition is taking forever to run. The same queries run fine and finish up quickly on table with no partition. Am I missing something, I suspect this has something to do with the way S3 behaves. A query example is : select id, (max(unix_timestamp(ts, "MM/dd/yyyy HH:mm")) - min(unix_timestamp(ts, "MM/dd/yyyy HH:mm")))/(60*60) from logs group by id; Thanks, Richin