The total number of bytes of the input will be used to determine whether to not launch a map-reduce job for this query. That was in my original mail.
However, given any complex where condition and the lack of column statistics in hive, we cannot determine the number of bytes that would be needed to satisfy the where condition. On 7/31/12 7:07 AM, "Navis류승우" <navis....@nexr.com> wrote: >It supports table sampling also. > >select * from src TABLESAMPLE (BUCKET 1 OUT OF 40 ON key); >select * from src TABLESAMPLE (0.25 PERCENT); > >But there is no sampling option specifying number of bytes. This can be >done in another issue. > >2012/7/31 Owen O'Malley <omal...@apache.org> > >> On Sat, Jul 28, 2012 at 6:17 PM, Navis류승우 <navis....@nexr.com> wrote: >> >> > I was thinking of timeout for fetching, 2000msec for example. How >>about >> > that? >> > >> >> Instead of time, which requires launching the query and letting it >>timeout, >> how about determining the number of bytes that would need to be fetched >>to >> the local box? Limiting it to 100 or 200 mb seems reasonable. >> >> -- Owen >>