Re: Hive File Sizes, Merging, and Splits

2012-09-26 Thread Ruslan Al-Fakikh
tasks is BETTER. > > > > From: John Omernik [j...@omernik.com] > Sent: Tuesday, September 25, 2012 7:11 PM > To: user@hive.apache.org > Subject: Re: Hive File Sizes, Merging, and Splits > > Isn't there an overhead associated with each map task? Based on that, m

RE: Hive File Sizes, Merging, and Splits

2012-09-25 Thread Connell, Chuck
But remember that you are running on parallel machines. Depending on the hardware configuration, more map tasks is BETTER. From: John Omernik [j...@omernik.com] Sent: Tuesday, September 25, 2012 7:11 PM To: user@hive.apache.org Subject: Re: Hive File Sizes

Re: Hive File Sizes, Merging, and Splits

2012-09-25 Thread John Omernik
Isn't there an overhead associated with each map task? Based on that, my hypothesis is if I pay attention to may data, merge up small files after load, and ensure split sizes are close to files sizes, I can keep the number of map tasks to an absolute minimum. On Tue, Sep 25, 2012 at 2:35 PM, Con

RE: Hive File Sizes, Merging, and Splits

2012-09-25 Thread Connell, Chuck
Why do you think the current generated code is inefficient? From: John Omernik [mailto:j...@omernik.com] Sent: Tuesday, September 25, 2012 2:57 PM To: user@hive.apache.org Subject: Hive File Sizes, Merging, and Splits I am really struggling trying to make hears or tails out of how to optimize t