Subject: Re: Files Per Partition Causing Slowness
To: user@hive.apache.org
Thank you Edward, I knew the number of partitions mattered, but I didn't think
1000 would be to much. However, I didn't realize the number of files per
partition was also a fact prior to job submission.
I am
This is discussed in the programming hive book. The more files the longer
it takes the job tracker to plan the job. The more tasks the more things
the job tracker has to track. The more partitions the more metastore
lookups are required. All of these things limit throughput. I do not like
tables wi