Re: map side join with group by

2012-12-13 Thread Chen Song
Thanks Nitin. This is all I want to clarify :) Chen On Thu, Dec 13, 2012 at 2:30 PM, Nitin Pawar wrote: > to improve the speed of the job they created map only joins so that all > the records associated with a key fall to a map .. reducers slows it down. > If the reducer has to do some more job

Re: map side join with group by

2012-12-13 Thread Nitin Pawar
to improve the speed of the job they created map only joins so that all the records associated with a key fall to a map .. reducers slows it down. If the reducer has to do some more job then they launch another job. bear in mind, when we say map only join we are absolutely sure that speed will inc

Re: map side join with group by

2012-12-13 Thread Chen Song
Nitin Yeah. My original question is that is there a way to force Hive (or rather to say, is it possible) to execute map side join at mapper phase and group by in reduce phase. So instead of launching a map only job (join) and map reduce job (group by), doing it altogether in a single MR job. This

Re: map side join with group by

2012-12-13 Thread Nitin Pawar
chen in mapside join .. there are no reducers .. its MAP ONLY job On Thu, Dec 13, 2012 at 11:54 PM, Chen Song wrote: > Understood that fact that it is impossible in the same MR job if both join > and group by are gonna happen in the reduce phase (because the join keys > and group by keys are di

Re: map side join with group by

2012-12-13 Thread Chen Song
Understood that fact that it is impossible in the same MR job if both join and group by are gonna happen in the reduce phase (because the join keys and group by keys are different). But for map side join, the joins would be complete by the end of the map phase, and outputs should be ready to be dis

Re: map side join with group by

2012-12-13 Thread Nitin Pawar
Thats because for the first job the join keys are different and second job group by keys are different, you just cant assume join keys and group keys will be same so they are two different jobs On Thu, Dec 13, 2012 at 8:26 PM, Chen Song wrote: > Yeah, my abridged version of query might be a lit

Re: map side join with group by

2012-12-13 Thread Chen Song
Yeah, my abridged version of query might be a little broken but my point is that when a query has a map join and group by, even in its simplified incarnation, it will launch two jobs. I was just wondering why map join and group by cannot be accomplished in one MR job. Best, Chen On Thu, Dec 13, 2

Re: map side join with group by

2012-12-12 Thread Nitin Pawar
I think Chen wanted to know why this is two phased query if I understood it correctly When you run a mapside join .. it just performs the join query .. after that to execute the group by part it launches the second job. I may be wrong but this is how I saw it whenever I executed group by queries

Re: map side join with group by

2012-12-12 Thread Mark Grover
Hi Chen, I think we would need some more information. The query is referring to a table called "d" in the MAPJOIN hint but there is not such table in the query. Moreover, Map joins only make sense when the right table is the one being "mapped" (in other words, being kept in memory) in case of a Le