Re: Map side join

Souvik Banerjee Thu, 13 Dec 2012 10:37:00 -0800

Thanks for the help.
What I did earlier is that I changed the configuration in HDFS and created
the table. I expected that the block size of the new Table to be of 32 MB.
But I found that while using Cloudera Manager you need to deploy Change in
Configuration of both the HDFS and Mapreduce. (I did it only for HDFS)
Now I deleted the old table and recreated the same. Now I could launch more
mappers.
Thanks a lot once again. Will post you what happens with more mappers.


Thanks and regards,
Souvik.

On Thu, Dec 13, 2012 at 12:06 PM, <bejoy...@yahoo.com> wrote:

> **
> Hi Souvik
>
> To have the new hdfs block size in effect on the already existing files,
> you need to re copy them into hdfs.
>
> To play with the number of mappers you can set lesser value like 64mb for
> min and max split size.
>
> Mapred.min.split.size and mapred.max.split.size
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Souvik Banerjee <souvikbaner...@gmail.com>
> *Date: *Thu, 13 Dec 2012 12:00:16 -0600
> *To: *<user@hive.apache.org>; <bejoy...@yahoo.com>
> *Subject: *Re: Map side join
>
> Hi Bejoy,
>
> The input files are non-compressed text file.
> There are enough free slots in the cluster.
>
> Can you please let me know can I increase the no of mappers?
> I tried reducing the HDFS block size to 32 MB from 128 MB. I was expecting
> to get more mappers. But still it's launching same no of mappers like it
> was doing while the HDFS block size was 128 MB. I have enough map slots
> available, but not being able to utilize those.
>
>
> Thanks and regards,
> Souvik.
>
>
> On Thu, Dec 13, 2012 at 11:12 AM, <bejoy...@yahoo.com> wrote:
>
>> **
>> Hi Souvik
>>
>> Is your input files compressed using some non splittable compression
>> codec?
>>
>> Do you have enough free slots while this job is running?
>>
>> Make sure that the job is not running locally.
>>
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * Souvik Banerjee <souvikbaner...@gmail.com>
>> *Date: *Wed, 12 Dec 2012 14:27:27 -0600
>> *To: *<user@hive.apache.org>; <bejoy...@yahoo.com>
>> *ReplyTo: * user@hive.apache.org
>> *Subject: *Re: Map side join
>>
>> Hi Bejoy,
>>
>> Yes I ran the pi example. It was fine.
>> Regarding the HIVE Job what I found is that it took 4 hrs for the first
>> map job to get completed.
>> Those map tasks were doing their job and only reported status after
>> completion. It is indeed taking too long time to finish. Nothing I could
>> find relevant in the logs.
>>
>> Thanks and regards,
>> Souvik.
>>
>> On Wed, Dec 12, 2012 at 8:04 AM, <bejoy...@yahoo.com> wrote:
>>
>>> **
>>> Hi Souvik
>>>
>>> Apart from hive jobs is the normal mapreduce jobs like the wordcount
>>> running fine on your cluster?
>>>
>>> If it is working, for the hive jobs are you seeing anything skeptical in
>>> task, Tasktracker or jobtracker logs?
>>>
>>>
>>> Regards
>>> Bejoy KS
>>>
>>> Sent from remote device, Please excuse typos
>>> ------------------------------
>>> *From: * Souvik Banerjee <souvikbaner...@gmail.com>
>>> *Date: *Tue, 11 Dec 2012 17:12:20 -0600
>>> *To: *<user@hive.apache.org>; <bejoy...@yahoo.com>
>>> *ReplyTo: * user@hive.apache.org
>>> *Subject: *Re: Map side join
>>>
>>> Hello Everybody,
>>>
>>> Need help in for on HIVE join. As we were talking about the Map side
>>> join I tried that.
>>> I set the flag set hive.auto.convert.join=true;
>>>
>>> I saw Hive converts the same to map join while launching the job. But
>>> the problem is that none of the map job progresses in my case. I made the
>>> dataset smaller. Now it's only 512 MB cross 25 MB. I was expecting it to be
>>> done very quickly.
>>> No luck with any change of settings.
>>> Failing to progress with the default setting changes these settings.
>>> set hive.mapred.local.mem=1024; // Initially it was 216 I guess
>>> set hive.join.cache.size=100000; // Initialliu it was 25000
>>>
>>> Also on Hadoop side I made this changes
>>>
>>> mapred.child.java.opts -Xmx1073741824
>>>
>>> But I don't see any progress. After more than 40 minutes of run I am at
>>> 0% map completion state.
>>> Can you please throw some light on this?
>>>
>>> Thanks a lot once again.
>>>
>>> Regards,
>>> Souvik.
>>>
>>>
>>>
>>> On Fri, Dec 7, 2012 at 2:32 PM, Souvik Banerjee <
>>> souvikbaner...@gmail.com> wrote:
>>>
>>>> Hi Bejoy,
>>>>
>>>> That's wonderful. Thanks for your reply.
>>>> What I was wondering if HIVE can do map side join with more than one
>>>> condition on JOIN clause.
>>>> I'll simply try it out and post the result.
>>>>
>>>> Thanks once again.
>>>>
>>>> Regards,
>>>> Souvik.
>>>>
>>>>  On Fri, Dec 7, 2012 at 2:10 PM, <bejoy...@yahoo.com> wrote:
>>>>
>>>>> **
>>>>> Hi Souvik
>>>>>
>>>>> In earlier versions of hive you had to give the map join hint. But in
>>>>> later versions just set hive.auto.convert.join = true;
>>>>> Hive automatically selects the smaller table. It is better to give the
>>>>> smaller table as the first one in join.
>>>>>
>>>>> You can use a map join if you are joining a small table with a large
>>>>> one, in terms of data size. By small, better to have the smaller table 
>>>>> size
>>>>> in range of MBs.
>>>>> Regards
>>>>> Bejoy KS
>>>>>
>>>>> Sent from remote device, Please excuse typos
>>>>> ------------------------------
>>>>> *From: *Souvik Banerjee <souvikbaner...@gmail.com>
>>>>> *Date: *Fri, 7 Dec 2012 13:58:25 -0600
>>>>> *To: *<user@hive.apache.org>
>>>>> *ReplyTo: *user@hive.apache.org
>>>>> *Subject: *Map side join
>>>>>
>>>>> Hello everybody,
>>>>>
>>>>> I have got a question. I didn't came across any post which says
>>>>> somethign about this.
>>>>> I have got two tables. Lets say A and B.
>>>>> I want to join A & B in HIVE. I am currently using HIVE 0.9 version.
>>>>> The join would be on few columns. like on (A.id1 = B.id1) AND (A.id2 =
>>>>> B.id2) AND (A.id3 = B.id3)
>>>>>
>>>>> Can I ask HIVE to use map side join in this scenario? Should I give a
>>>>> hint to HIVE by saying /*+mapjoin(B)*/
>>>>>
>>>>> Get back to me if you want any more information in this regard.
>>>>>
>>>>> Thanks and regards,
>>>>> Souvik.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Map side join

Reply via email to