Re: map side join

2015-04-30 Thread Abe Weinograd
Great info, thanks. Makes sense on the partition since those files can be shipped by themselves. These are "reference" tables, but one happens to be pretty long. Thanks, Abe On Thu, Apr 30, 2015 at 12:54 PM, Gopal Vijayaraghavan wrote: > Hi, > > > Using CDH 5.3 - Hive 0.13. Does a view help

Re: map side join

2015-04-30 Thread Gopal Vijayaraghavan
Hi, > Using CDH 5.3 - Hive 0.13. Does a view help here? Does how i format >the table help in reducing size? No, a view does not help - they are not materialized and you need hive-1.0 to have temporary table support. The only way out is if you only have 1 filter column in the system. I assume

Re: map side join

2015-04-30 Thread Abe Weinograd
Using CDH 5.3 - Hive 0.13. Does a view help here? Does how i format the table help in reducing size? Abe On Thu, Apr 30, 2015 at 11:07 AM, Gopal Vijayaraghavan wrote: > Hi, > > > its submitting the whole table to the job. if I use a view with the > >filter > > baked in, will that help? I do

Re: map side join

2015-04-30 Thread Gopal Vijayaraghavan
Hi, > its submitting the whole table to the job. if I use a view with the >filter > baked in, will that help? I don't want to have to jack up the JVM for >the > client/HiveServer2 to accommodate the full table. Which hive version are you using? If you¹re on a recent version like hive-1.0, this

Re: Map-side join memory limit is too low

2014-02-03 Thread Lefty Leverenz
Searching the JIRA for HADOOP_HEAPSIZE turned up this new ticket (and related ones mentioned in the comments): HADOOP-10245 : The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java > with "-Xmx" options twice. The impact is that

Re: Map-side join memory limit is too low

2014-02-02 Thread Navis류승우
try "set hive.mapred.local.mem=7000" or add it to hive-site.xml instead of modifying hive-env.sh HADOOP_HEAPSIZE is not in use. Should fix documentation of it. Thanks, Navis 2014-01-31 Avrilia Floratou : > Hi, > I'm running hive 0.12 on yarn and I'm trying to convert a common join into > a map

Re: Map side join

2012-12-27 Thread Souvik Banerjee
ue like 64mb for >> min and max split size. >> >> Mapred.min.split.size and mapred.max.split.size >> >> Regards >> Bejoy KS >> >> Sent from remote device, Please excuse typos >> -- >> *From: * Souvik Banerjee >> *

Re: map side join with group by

2012-12-13 Thread Chen Song
Thanks Nitin. This is all I want to clarify :) Chen On Thu, Dec 13, 2012 at 2:30 PM, Nitin Pawar wrote: > to improve the speed of the job they created map only joins so that all > the records associated with a key fall to a map .. reducers slows it down. > If the reducer has to do some more job

Re: map side join with group by

2012-12-13 Thread Nitin Pawar
to improve the speed of the job they created map only joins so that all the records associated with a key fall to a map .. reducers slows it down. If the reducer has to do some more job then they launch another job. bear in mind, when we say map only join we are absolutely sure that speed will inc

Re: map side join with group by

2012-12-13 Thread Chen Song
Nitin Yeah. My original question is that is there a way to force Hive (or rather to say, is it possible) to execute map side join at mapper phase and group by in reduce phase. So instead of launching a map only job (join) and map reduce job (group by), doing it altogether in a single MR job. This

Re: map side join with group by

2012-12-13 Thread Nitin Pawar
chen in mapside join .. there are no reducers .. its MAP ONLY job On Thu, Dec 13, 2012 at 11:54 PM, Chen Song wrote: > Understood that fact that it is impossible in the same MR job if both join > and group by are gonna happen in the reduce phase (because the join keys > and group by keys are di

Re: Map side join

2012-12-13 Thread Souvik Banerjee
KS > > Sent from remote device, Please excuse typos > -- > *From: * Souvik Banerjee > *Date: *Thu, 13 Dec 2012 12:00:16 -0600 > *To: *; > *Subject: *Re: Map side join > > Hi Bejoy, > > The input files are non-compressed text file. > There

Re: map side join with group by

2012-12-13 Thread Chen Song
Understood that fact that it is impossible in the same MR job if both join and group by are gonna happen in the reduce phase (because the join keys and group by keys are different). But for map side join, the joins would be complete by the end of the map phase, and outputs should be ready to be dis

Re: Map side join

2012-12-13 Thread bejoy_ks
from remote device, Please excuse typos -Original Message- From: Souvik Banerjee Date: Thu, 13 Dec 2012 12:00:16 To: ; Subject: Re: Map side join Hi Bejoy, The input files are non-compressed text file. There are enough free slots in the cluster. Can you please let me know can I increase

Re: Map side join

2012-12-13 Thread Souvik Banerjee
e > *Date: *Wed, 12 Dec 2012 14:27:27 -0600 > *To: *; > *ReplyTo: * user@hive.apache.org > *Subject: *Re: Map side join > > Hi Bejoy, > > Yes I ran the pi example. It was fine. > Regarding the HIVE Job what I found is that it took 4 hrs for the first > map job to get

Re: Map side join

2012-12-13 Thread bejoy_ks
Banerjee Date: Wed, 12 Dec 2012 14:27:27 To: ; Reply-To: user@hive.apache.org Subject: Re: Map side join Hi Bejoy, Yes I ran the pi example. It was fine. Regarding the HIVE Job what I found is that it took 4 hrs for the first map job to get completed. Those map tasks were doing their job and

Re: map side join with group by

2012-12-13 Thread Nitin Pawar
Thats because for the first job the join keys are different and second job group by keys are different, you just cant assume join keys and group keys will be same so they are two different jobs On Thu, Dec 13, 2012 at 8:26 PM, Chen Song wrote: > Yeah, my abridged version of query might be a lit

Re: map side join with group by

2012-12-13 Thread Chen Song
Yeah, my abridged version of query might be a little broken but my point is that when a query has a map join and group by, even in its simplified incarnation, it will launch two jobs. I was just wondering why map join and group by cannot be accomplished in one MR job. Best, Chen On Thu, Dec 13, 2

Re: map side join with group by

2012-12-12 Thread Nitin Pawar
I think Chen wanted to know why this is two phased query if I understood it correctly When you run a mapside join .. it just performs the join query .. after that to execute the group by part it launches the second job. I may be wrong but this is how I saw it whenever I executed group by queries

Re: map side join with group by

2012-12-12 Thread Mark Grover
Hi Chen, I think we would need some more information. The query is referring to a table called "d" in the MAPJOIN hint but there is not such table in the query. Moreover, Map joins only make sense when the right table is the one being "mapped" (in other words, being kept in memory) in case of a Le

Re: Map side join

2012-12-12 Thread Souvik Banerjee
g skeptical in > task, Tasktracker or jobtracker logs? > > > Regards > Bejoy KS > > Sent from remote device, Please excuse typos > -- > *From: * Souvik Banerjee > *Date: *Tue, 11 Dec 2012 17:12:20 -0600 > *To: *; > *ReplyTo: * user

Re: Map side join

2012-12-12 Thread bejoy_ks
-Original Message- From: Souvik Banerjee Date: Tue, 11 Dec 2012 17:12:20 To: ; Reply-To: user@hive.apache.org Subject: Re: Map side join Hello Everybody, Need help in for on HIVE join. As we were talking about the Map side join I tried that. I set the flag set hive.auto.convert.join=true; I

Re: Map side join

2012-12-11 Thread Souvik Banerjee
Hello Everybody, Need help in for on HIVE join. As we were talking about the Map side join I tried that. I set the flag set hive.auto.convert.join=true; I saw Hive converts the same to map join while launching the job. But the problem is that none of the map job progresses in my case. I made the

Re: Map side join

2012-12-07 Thread Souvik Banerjee
Hi Bejoy, That's wonderful. Thanks for your reply. What I was wondering if HIVE can do map side join with more than one condition on JOIN clause. I'll simply try it out and post the result. Thanks once again. Regards, Souvik. On Fri, Dec 7, 2012 at 2:10 PM, wrote: > ** > Hi Souvik > > In earl

Re: Map side join

2012-12-07 Thread bejoy_ks
Hi Souvik In earlier versions of hive you had to give the map join hint. But in later versions just set hive.auto.convert.join = true; Hive automatically selects the smaller table. It is better to give the smaller table as the first one in join. You can use a map join if you are joining a smal

Re: Map side join and Serde jar in distributed cache missing

2012-09-24 Thread Aniket Mokashi
Just a guess- Put your jar on hadoop classpath. On Mon, Sep 24, 2012 at 5:45 PM, Abhishek Pratap Singh wrote: > i m using hive-0.7.1 > > > On Mon, Sep 24, 2012 at 5:10 PM, Edward Capriolo wrote: > >> I have noticed this as well with hive 0.7.0. Not sure what CDH is >> based on but newer versions

Re: Map side join and Serde jar in distributed cache missing

2012-09-24 Thread Abhishek Pratap Singh
i m using hive-0.7.1 On Mon, Sep 24, 2012 at 5:10 PM, Edward Capriolo wrote: > I have noticed this as well with hive 0.7.0. Not sure what CDH is > based on but newer versions could suffer as well. What version of hive > do you have? > > On Mon, Sep 24, 2012 at 7:30 PM, Abhishek Pratap Singh > wr

Re: Map side join and Serde jar in distributed cache missing

2012-09-24 Thread Edward Capriolo
I have noticed this as well with hive 0.7.0. Not sure what CDH is based on but newer versions could suffer as well. What version of hive do you have? On Mon, Sep 24, 2012 at 7:30 PM, Abhishek Pratap Singh wrote: > Hi all, > > I have enabled automatic Map join for any table less than 50MB. This ta

Re: Map side join

2012-06-18 Thread Aniket Mokashi
Hive also have something called uniquejoin. May be you are looking for that. I cannot find documentation for your reference but you can do a jira search. It allows you to perform joining multiple sources with same key, mapside. (all sources should have the same key) ~Aniket On Wed, Jun 13, 2012 a

RE: Map side join

2012-06-13 Thread Tucker, Matt
Hi, Assuming that 4 tables are small enough to fit in the Distributed Cache, the joins between the tables all need to join against a common key. Example: set hive.auto.convert.join=true; SELECT * FROM large JOIN smalla ON large.key = smalla.key1 JOIN smallb ON large.key =