good point... i should have used ON... with ON it runs fine as a map-join, and if i set hive.auto.convert.join=false then it runs with my specified number of reducers.
with right number of reducers On Thu, Jan 12, 2012 at 6:12 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > You should do joins using the ON clause. > https://cwiki.apache.org/Hive/languagemanual-joins.html > be careful if you do the joins wrong hive does a Cartesian product > followed by a really long reduce phase rather then the optimal join process. > > On Thu, Jan 12, 2012 at 6:04 PM, Aaron McCurry <amccu...@gmail.com> wrote: > >> I see that your query is kinda generic and probably not the original >> query. I have seen this behavior with a simple typo like: >> >> Notice col3. >> >> create table z as select x.* from table1 x join table2 y where ( >> x.col1 = y.col1 and >> x.col2 = y.col2 and >> y.col3 = y.col3 and >> x.col4 = y.col4 and >> x.col5 = y.col5 >> ); >> >> Just a thought. >> >> Aaron >> >> On Thu, Jan 12, 2012 at 6:00 PM, Wojciech Langiewicz < >> wlangiew...@gmail.com> wrote: >> >>> Hello, >>> Have you tried running only select, without creating table? What are >>> results? >>> How did you tried to set number of reducers? Have you used this: >>> set mapred.reduce.tasks = xyz; >>> How many mappers does this query use? >>> >>> >>> On 12.01.2012 23:53, Koert Kuipers wrote: >>> >>>> I am running a basic join of 2 tables and it will only run with 1 >>>> reducer. >>>> why is that? i tried to set the number of reducers and it didn't work. >>>> hive >>>> just ignored it. >>>> >>>> create table z as select x.* from table1 x join table2 y where ( >>>> x.col1 = y.col1 and >>>> x.col2 = y.col2 and >>>> x.col3 = y.col3 and >>>> x.col4 = y.col4 and >>>> x.col5 = y.col5 >>>> ); >>>> >>>> both tables are backed by multiple files / blocks / chunks >>>> >>>> >>> -- >>> Wojciech Langiewicz >>> >> >> >