RE: Files Per Partition Causing Slowness

2014-12-02 Thread Mike Roberts
Subject: Re: Files Per Partition Causing Slowness To: user@hive.apache.org Thank you Edward, I knew the number of partitions mattered, but I didn't think 1000 would be to much. However, I didn't realize the number of files per partition was also a fact prior to job submission. I am

Re: Files Per Partition Causing Slowness

2014-12-02 Thread Edward Capriolo
This is discussed in the programming hive book. The more files the longer it takes the job tracker to plan the job. The more tasks the more things the job tracker has to track. The more partitions the more metastore lookups are required. All of these things limit throughput. I do not like tables wi