tasks is BETTER.
>
>
>
> From: John Omernik [j...@omernik.com]
> Sent: Tuesday, September 25, 2012 7:11 PM
> To: user@hive.apache.org
> Subject: Re: Hive File Sizes, Merging, and Splits
>
> Isn't there an overhead associated with each map task? Based on that, m
But remember that you are running on parallel machines. Depending on the
hardware configuration, more map tasks is BETTER.
From: John Omernik [j...@omernik.com]
Sent: Tuesday, September 25, 2012 7:11 PM
To: user@hive.apache.org
Subject: Re: Hive File Sizes
Isn't there an overhead associated with each map task? Based on that, my
hypothesis is if I pay attention to may data, merge up small files after
load, and ensure split sizes are close to files sizes, I can keep the
number of map tasks to an absolute minimum.
On Tue, Sep 25, 2012 at 2:35 PM, Con
Why do you think the current generated code is inefficient?
From: John Omernik [mailto:j...@omernik.com]
Sent: Tuesday, September 25, 2012 2:57 PM
To: user@hive.apache.org
Subject: Hive File Sizes, Merging, and Splits
I am really struggling trying to make hears or tails out of how to optimize t