Basically
1. if you join table try to filter out as much as possible in WHERE (to
reduce amount of data sent form map to reduce step)
2. if you join big table with small table (< 500 MB) use SELECT /*+
MAPJOIN(small_table) */  hint to avoid reduce step.

3. if you join big table with big table make sure you choose right number
of reduce tasks. check "Reduce shuffle bytes" counter for several reducers
on Job Tracker UI.
from my experience it should not be more than 1 - 2 GB
play with
set mapred.reduce.tasks=xxx
to make Reduce shuffle bytes < 1-2 GB





On Thu, Jun 13, 2013 at 10:08 PM, bharath vissapragada <
bharathvissapragada1...@gmail.com> wrote:

> Hi,
>
> Can you please paste the output of "explain" for both the queries so that
> we can know if the predicate is being pushed down in both the cases.
>
> Thanks,
>
>
>
> On Fri, Jun 14, 2013 at 10:05 AM, Naga Vijay <nvti...@gmail.com> wrote:
>
>> Hi,
>>
>> Here are the test results for the two query options shown in my initial
>> email -
>>
>> Test Criteria - Both tables have 20 million rows each, 20 partitions, 16
>> buckets for item_id
>>
>> Query Option 1 - Took 25 minutes, 5 seconds
>> Query Option 2 - Took 24 minutes, 42 seconds
>>
>> Thanks for all the responses!
>>
>> Naga
>>
>>
>> On Thu, Jun 13, 2013 at 6:42 PM, Navis류승우 <navis....@nexr.com> wrote:
>>
>>> You can use "explain" for confirming differences. For inner joins, it
>>> would make the same plan.
>>>
>>> 2013/6/14 Igor Tatarinov <i...@decide.com>:
>>> > I would expect no difference because of predicate pushdown.
>>> >
>>> > igor
>>> > decide.com
>>> >
>>> >
>>> > On Thu, Jun 13, 2013 at 11:31 AM, Naga Vijay <nvti...@gmail.com>
>>> wrote:
>>> >>
>>> >> Sure, Will do
>>> >>
>>> >>
>>> >> On Thu, Jun 13, 2013 at 10:42 AM, Stephen Sprague <sprag...@gmail.com
>>> >
>>> >> wrote:
>>> >>>
>>> >>> Hi naja,
>>> >>> test those two versions (or three now) and report back to the group.
>>>  :)
>>> >>> even if some smarty-pants thinks he knows the answer its always good
>>> to
>>> >>> confirm things are as they should be.
>>> >>>
>>> >>>
>>> >>> On Wed, Jun 12, 2013 at 11:54 PM, Sanjay Subramanian
>>> >>> <sanjay.subraman...@wizecommerce.com> wrote:
>>> >>>>
>>> >>>> Hi
>>> >>>>
>>> >>>> I would actually do it like this…so that the set on the left of JOIN
>>> >>>> becomes smaller
>>> >>>>
>>> >>>> SELECT a.item_id, a.create_dt
>>> >>>> FROM
>>> >>>>      ( SELECT
>>> >>>>                     item_id, create_dt
>>> >>>>       FROM
>>> >>>>                     A
>>> >>>>       WHERE
>>> >>>>                    item_id = 'I001'
>>> >>>>        AND
>>> >>>>                   category_name = 'C001'
>>> >>>>       )  a
>>> >>>> JOIN
>>> >>>>          b
>>> >>>> ON
>>> >>>>         a.item_id = b.item_id
>>> >>>> ;
>>> >>>>
>>> >>>>
>>> >>>> From: Naga Vijay <nvti...@gmail.com>
>>> >>>> Reply-To: "user@hive.apache.org" <user@hive.apache.org>
>>> >>>> Date: Wednesday, June 12, 2013 9:17 PM
>>> >>>> To: "user@hive.apache.org" <user@hive.apache.org>
>>> >>>> Subject: Enhancing Query Join to speed up Query
>>> >>>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> Which of the two query options is better?
>>> >>>>
>>> >>>> SELECT a.item_id, a.create_dt
>>> >>>> FROM   a JOIN b
>>> >>>> ON     (a.item_id = b.item_id)
>>> >>>> WHERE  a.item_id = 'I001'
>>> >>>> AND    a.category_name = 'C001';
>>> >>>>
>>> >>>> - or -
>>> >>>>
>>> >>>> SELECT a.item_id, a.create_dt
>>> >>>> FROM   a JOIN b
>>> >>>> ON     (a.item_id = b.item_id AND a.item_id = 'I001')
>>> >>>> WHERE  a.category_name = 'C001';
>>> >>>>
>>> >>>> Thanks
>>> >>>> Naga
>>> >>>>
>>> >>>> CONFIDENTIALITY NOTICE
>>> >>>> ======================
>>> >>>> This email message and any attachments are for the exclusive use of
>>> the
>>> >>>> intended recipient(s) and may contain confidential and privileged
>>> >>>> information. Any unauthorized review, use, disclosure or
>>> distribution is
>>> >>>> prohibited. If you are not the intended recipient, please contact
>>> the sender
>>> >>>> by reply email and destroy all copies of the original message along
>>> with any
>>> >>>> attachments, from your computer system. If you are the intended
>>> recipient,
>>> >>>> please be advised that the content of this message is subject to
>>> access,
>>> >>>> review and disclosure by the sender's Email System Administrator.
>>> >>>
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>

Reply via email to