Hi Keith,
Have you tried the TABLESAMPLE command?
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling
Tim
On Thu, Oct 3, 2013 at 11:58 AM, Yin Huai wrote:
> Hello Keith,
>
> Hive will not launch a MR job for your query because it basically reads
> all columns from a table
Hello Keith,
Hive will not launch a MR job for your query because it basically reads all
columns from a table. Hive will fetch the data for you directly from the
underlying filesystem.
Thanks,
Yin
On Wed, Oct 2, 2013 at 2:48 PM, Keith Wiley wrote:
> I'm trying to create a subset of a large
I'm trying to create a subset of a large table for testing. The following
approach works:
create table subset_table as
select * from large_table limit 1000
...but it only uses one reducer. I would like to speed up the process of
creating a subset but distributing across multiple reducers. I