Thanks Nitin. This is all I want to clarify :)
Chen
On Thu, Dec 13, 2012 at 2:30 PM, Nitin Pawar wrote:
> to improve the speed of the job they created map only joins so that all
> the records associated with a key fall to a map .. reducers slows it down.
> If the reducer has to do some more job
to improve the speed of the job they created map only joins so that all the
records associated with a key fall to a map .. reducers slows it down. If
the reducer has to do some more job then they launch another job.
bear in mind, when we say map only join we are absolutely sure that speed
will inc
Nitin
Yeah. My original question is that is there a way to force Hive (or rather
to say, is it possible) to execute map side join at mapper phase and group
by in reduce phase. So instead of launching a map only job (join) and map
reduce job (group by), doing it altogether in a single MR job. This
chen in mapside join .. there are no reducers .. its MAP ONLY job
On Thu, Dec 13, 2012 at 11:54 PM, Chen Song wrote:
> Understood that fact that it is impossible in the same MR job if both join
> and group by are gonna happen in the reduce phase (because the join keys
> and group by keys are di
Understood that fact that it is impossible in the same MR job if both join
and group by are gonna happen in the reduce phase (because the join keys
and group by keys are different). But for map side join, the joins would be
complete by the end of the map phase, and outputs should be ready to be
dis
Thats because for the first job the join keys are different and second job
group by keys are different, you just cant assume join keys and group keys
will be same so they are two different jobs
On Thu, Dec 13, 2012 at 8:26 PM, Chen Song wrote:
> Yeah, my abridged version of query might be a lit
Yeah, my abridged version of query might be a little broken but my point is
that when a query has a map join and group by, even in its simplified
incarnation, it will launch two jobs. I was just wondering why map join and
group by cannot be accomplished in one MR job.
Best,
Chen
On Thu, Dec 13, 2
I think Chen wanted to know why this is two phased query if I understood it
correctly
When you run a mapside join .. it just performs the join query .. after
that to execute the group by part it launches the second job.
I may be wrong but this is how I saw it whenever I executed group by
queries
Hi Chen,
I think we would need some more information.
The query is referring to a table called "d" in the MAPJOIN hint but
there is not such table in the query. Moreover, Map joins only make
sense when the right table is the one being "mapped" (in other words,
being kept in memory) in case of a Le