Re: Map side join

Souvik Banerjee Wed, 12 Dec 2012 12:27:56 -0800

Hi Bejoy,

Yes I ran the pi example. It was fine.
Regarding the HIVE Job what I found is that it took 4 hrs for the first map
job to get completed.
Those map tasks were doing their job and only reported status after
completion. It is indeed taking too long time to finish. Nothing I could
find relevant in the logs.


Thanks and regards,
Souvik.

On Wed, Dec 12, 2012 at 8:04 AM, <bejoy...@yahoo.com> wrote:

> **
> Hi Souvik
>
> Apart from hive jobs is the normal mapreduce jobs like the wordcount
> running fine on your cluster?
>
> If it is working, for the hive jobs are you seeing anything skeptical in
> task, Tasktracker or jobtracker logs?
>
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Souvik Banerjee <souvikbaner...@gmail.com>
> *Date: *Tue, 11 Dec 2012 17:12:20 -0600
> *To: *<user@hive.apache.org>; <bejoy...@yahoo.com>
> *ReplyTo: * user@hive.apache.org
> *Subject: *Re: Map side join
>
> Hello Everybody,
>
> Need help in for on HIVE join. As we were talking about the Map side join
> I tried that.
> I set the flag set hive.auto.convert.join=true;
>
> I saw Hive converts the same to map join while launching the job. But the
> problem is that none of the map job progresses in my case. I made the
> dataset smaller. Now it's only 512 MB cross 25 MB. I was expecting it to be
> done very quickly.
> No luck with any change of settings.
> Failing to progress with the default setting changes these settings.
> set hive.mapred.local.mem=1024; // Initially it was 216 I guess
> set hive.join.cache.size=100000; // Initialliu it was 25000
>
> Also on Hadoop side I made this changes
>
> mapred.child.java.opts -Xmx1073741824
>
> But I don't see any progress. After more than 40 minutes of run I am at 0%
> map completion state.
> Can you please throw some light on this?
>
> Thanks a lot once again.
>
> Regards,
> Souvik.
>
>
>
> On Fri, Dec 7, 2012 at 2:32 PM, Souvik Banerjee 
> <souvikbaner...@gmail.com>wrote:
>
>> Hi Bejoy,
>>
>> That's wonderful. Thanks for your reply.
>> What I was wondering if HIVE can do map side join with more than one
>> condition on JOIN clause.
>> I'll simply try it out and post the result.
>>
>> Thanks once again.
>>
>> Regards,
>> Souvik.
>>
>>  On Fri, Dec 7, 2012 at 2:10 PM, <bejoy...@yahoo.com> wrote:
>>
>>> **
>>> Hi Souvik
>>>
>>> In earlier versions of hive you had to give the map join hint. But in
>>> later versions just set hive.auto.convert.join = true;
>>> Hive automatically selects the smaller table. It is better to give the
>>> smaller table as the first one in join.
>>>
>>> You can use a map join if you are joining a small table with a large
>>> one, in terms of data size. By small, better to have the smaller table size
>>> in range of MBs.
>>> Regards
>>> Bejoy KS
>>>
>>> Sent from remote device, Please excuse typos
>>> ------------------------------
>>> *From: *Souvik Banerjee <souvikbaner...@gmail.com>
>>> *Date: *Fri, 7 Dec 2012 13:58:25 -0600
>>> *To: *<user@hive.apache.org>
>>> *ReplyTo: *user@hive.apache.org
>>> *Subject: *Map side join
>>>
>>> Hello everybody,
>>>
>>> I have got a question. I didn't came across any post which says
>>> somethign about this.
>>> I have got two tables. Lets say A and B.
>>> I want to join A & B in HIVE. I am currently using HIVE 0.9 version.
>>> The join would be on few columns. like on (A.id1 = B.id1) AND (A.id2 =
>>> B.id2) AND (A.id3 = B.id3)
>>>
>>> Can I ask HIVE to use map side join in this scenario? Should I give a
>>> hint to HIVE by saying /*+mapjoin(B)*/
>>>
>>> Get back to me if you want any more information in this regard.
>>>
>>> Thanks and regards,
>>> Souvik.
>>>
>>
>>
>

Re: Map side join

Reply via email to