NIce, yea that would do it. On Tue, Mar 5, 2013 at 1:26 PM, Mark Grover <grover.markgro...@gmail.com>wrote:
> I typically change my query to query from a limited version of the whole > table. > > Change > > select really_expensive_select_clause > from > really_big_table > where > something=something > group by something=something > > to > > select really_expensive_select_clause > from > ( > select > * > from > really_big_table > limit 100 > )t > where > something=something > group by something=something > > > On Tue, Mar 5, 2013 at 10:57 AM, Dean Wampler > <dean.wamp...@thinkbiganalytics.com> wrote: > > Unfortunately, it will still go through the whole thing, then just limit > the > > output. However, there's a flag that I think only works in more recent > Hive > > releases: > > > > set hive.limit.optimize.enable=true > > > > This is supposed to apply limiting earlier in the data stream, so it will > > give different results that limiting just the output. > > > > Like Chuck said, you might consider sampling, but unless your table is > > organized into buckets, you'll at least scan the whole table, but maybe > not > > do all computation over it ?? > > > > Also, if you have a small sample data set: > > > > set hive.exec.mode.local.auto=true > > > > will cause Hive to bypass the Job and Task Trackers, calling APIs > directly, > > when it can do the whole thing in a single process. Not "lightning fast", > > but faster. > > > > dean > > > > On Tue, Mar 5, 2013 at 12:48 PM, Joey D'Antoni <jdant...@yahoo.com> > wrote: > >> > >> Just add a limit 1 to the end of your query. > >> > >> > >> > >> > >> On Mar 5, 2013, at 1:45 PM, Kyle B <kbi...@gmail.com> wrote: > >> > >> Hello, > >> > >> I was wondering if there is a way to quick-verify a Hive query before it > >> is run against a big dataset? The tables I am querying against have > millions > >> of records, and I'd like to verify my Hive query before I run it > against all > >> records. > >> > >> Is there a way to test the query against a small subset of the data, > >> without going into full MapReduce? As silly as this sounds, is there a > way > >> to MapReduce without the overhead of MapReduce? That way I can check my > >> query is doing what I want before I run it against all records. > >> > >> Thanks, > >> > >> -Kyle > > > > > > > > > > -- > > Dean Wampler, Ph.D. > > thinkbiganalytics.com > > +1-312-339-1330 > > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330