Hello All, I have been using the AWS setup for EMR for some time now and I am currently in the process of implementing spark/shark on my own cluster. I am installing from https://github.com/downloads/mesos/spark/spark-0.6.0-sources.tar.gz. Which includes hive0.9.0. I am using this with s3 and am unable to recover partitions from a directory with a series of other directories (partitions) inside of it. I want to have 2 partitions 2012-10-25 and 2012-10-26 which contain their respective files. For example I have the following files located at s3://varickTest3/nn/.
drwxrwxrwx - 0 1970-01-01 00:00 /nn/ds=2012-10-25 -rwxrwxrwx 1 49696432 2012-12-10 20:55 /nn/ds=2012-10-25/part-00000 -rwxrwxrwx 1 49696432 2012-12-10 20:55 /nn/ds=2012-10-25/part-00001 drwxrwxrwx - 0 1970-01-01 00:00 /nn/ds=2012-10-26 -rwxrwxrwx 1 49696432 2012-12-10 20:55 /nn/ds=2012-10-26/part-00000 -rwxrwxrwx 1 49696432 2012-12-10 20:55 /nn/ds=2012-10-26/part-00001 When I run the query in hive (not shark): CREATE EXTERNAL TABLE wiki(id BIGINT, title STRING, last_modified STRING, xml STRING, text STRING) PARTITIONED BY (ds STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3n://varickTest3/nn'; ALTER TABLE wiki RECOVER PARTITIONS; This will result in an empty table. I have tried many iterations of this and nothing has worked so far. Including adding: MSCK REPAIR TABLE wiki; And using s3 rather than s3n (credentials for both types are set in core-site.xml) And setting the options: SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=nonstrict; Although if I use: LOCATION 's3n://varickTest3/nn/* The table will have content but I am still unable to recover partitions. Is there any way to do this using settings or data structure (rather than writing a script) to partition the table using the directories as I can in AWS? Thank you for any help anyone can give me.