Re: non map-reduce for simple queries

Namit Jain Mon, 30 Jul 2012 21:13:30 -0700

The total number of bytes of the input will be used to determine whether
to not launch a map-reduce job for this
query. That was in my original mail.


However, given any complex where condition and the lack of column
statistics in hive, we cannot determine the
number of bytes that would be needed to satisfy the where condition.



On 7/31/12 7:07 AM, "Navis류승우" <navis....@nexr.com> wrote:

>It supports table sampling also.
>
>select * from src TABLESAMPLE (BUCKET 1 OUT OF 40 ON key);
>select * from src TABLESAMPLE (0.25 PERCENT);
>
>But there is no sampling option specifying number of bytes. This can be
>done in another issue.
>
>2012/7/31 Owen O'Malley <omal...@apache.org>
>
>> On Sat, Jul 28, 2012 at 6:17 PM, Navis류승우 <navis....@nexr.com> wrote:
>>
>> > I was thinking of timeout for fetching, 2000msec for example. How
>>about
>> > that?
>> >
>>
>> Instead of time, which requires launching the query and letting it
>>timeout,
>> how about determining the number of bytes that would need to be fetched
>>to
>> the local box? Limiting it to 100 or 200 mb seems reasonable.
>>
>> -- Owen
>>

Re: non map-reduce for simple queries

Reply via email to