Re: Use distribute to spread across reducers

2013-10-03 Thread Timothy Potter
Hi Keith, Have you tried the TABLESAMPLE command? https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling Tim On Thu, Oct 3, 2013 at 11:58 AM, Yin Huai wrote: > Hello Keith, > > Hive will not launch a MR job for your query because it basically reads > all columns from a table

Re: Use distribute to spread across reducers

2013-10-03 Thread Yin Huai
Hello Keith, Hive will not launch a MR job for your query because it basically reads all columns from a table. Hive will fetch the data for you directly from the underlying filesystem. Thanks, Yin On Wed, Oct 2, 2013 at 2:48 PM, Keith Wiley wrote: > I'm trying to create a subset of a large

Use distribute to spread across reducers

2013-10-02 Thread Keith Wiley
I'm trying to create a subset of a large table for testing. The following approach works: create table subset_table as select * from large_table limit 1000 ...but it only uses one reducer. I would like to speed up the process of creating a subset but distributing across multiple reducers. I