This seems useful, so I added a sentence to the explanation of STREAMTABLE in the JOINS wikidoc<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-Examples> :
> - > > In every map/reduce stage of the join, the table to be streamed can be > specified via a hint. e.g. in > SELECT /*+ STREAMTABLE(a) */ a.val, b.val, c.val FROM a JOIN b ON > (a.key = b.key1) JOIN c ON (c.key = b.key1) > > all the three tables are joined in a single map/reduce job and the > values for a particular value of the key for tables b and c are buffered in > the memory in the reducers. Then for each row retrieved from a, the join is > computed with the buffered rows. If the STREAMTABLE hint is omitted, > Hive streams the rightmost table in the join. > > But I didn't specify inner joins. Should that be made clear? Thanks. -- Lefty On Tue, Dec 3, 2013 at 1:40 AM, Nitin Pawar <nitinpawar...@gmail.com> wrote: > This is my understanding of both. Wait for the hive guru's to correct me > if i made any mistake > > > In Hive, when an inner join query happens the table at the last position > on the right streams its records to the reducers. This is the default > behavior. > > So say, you have a query select blah blah from t1 join t2 join t3 join t4 > on (blah blah) > all the maps emitting key values on table t1, t2, t3 just send it to > reducers and are bufferred in memory but for table t4 it streams the > records to the reducer for better memory management and thats why its > advised that you have largest table on the right > > This default behavior is changed by STREAMTABLE(t1) where you can tell > which table data you want to be streamed. > > On the other hand, mapjoin is a concept where there are no reducers are > involved. Its a join where the smaller table is buffered into memory of > each map and then the joins are performed by the maps itself. As the > smaller table data is available in memory, map jobs are very fast as the > reduce step is completely removed. > > > On Tue, Dec 3, 2013 at 2:47 PM, Baahu <bahub...@gmail.com> wrote: > >> Hi, >> What is the difference between hints STREAMTABLE ,MAPJOIN . >> >> Thanks, >> Baahu >> >> > > > -- > Nitin Pawar >