Re: Best way to avoid cross join

2014-03-19 Thread Nitin Pawar
from the mail thread's last line The correct fix would be to have 1 reducer in case of a Cartesian product hack to avoid this was (1 = 1). I think that's been taken care by the hash partitioner to go to single reducer. Other option (atleast for me) looks like to go PIG script. Never tried so yo

Re: Best way to avoid cross join

2014-03-19 Thread fab wol
Hey Nitin, Yong wrote exactly the oppsoite in his first sentence: *Cross join doesn't mean Hive has to use one reduce.* and this super old thread here lets me also assume that there can be used more than one reducer: http://mail-archives.apache.org/mod_mbox/hive-user/200904.mbox/%3ca132f89f9b9df

Re: Best way to avoid cross join

2014-03-19 Thread Nitin Pawar
hey Wolli, sorry missed this one. as Yong already replied, cross join always uses only one reducer. If you want to avoid this can you just try it to make full outer join with on condition (1 = 1) ? and see if you get your desired result On Wed, Mar 19, 2014 at 4:05 PM, fab wol wrote: > anyon

Re: Best way to avoid cross join

2014-03-19 Thread fab wol
anyone? still haven't solved this problem. Any help is appreciated. Cheers Wolli 2014-03-14 10:55 GMT+01:00 fab wol : > Hey Nitin, > > in import1 are at least 1.2 mio rows, with almost the same amount of > distinct id's and approxametly 40k distinct keywords. et_keywords contains > roundabout

Re: Best way to avoid cross join

2014-03-14 Thread fab wol
Hey Nitin, in import1 are at least 1.2 mio rows, with almost the same amount of distinct id's and approxametly 40k distinct keywords. et_keywords contains roundabout 2000 keywords. So the result of this cross join will be ca. 2.4 bio rows which need to be checked (see INSTR() function). Thx for l

RE: Best way to avoid cross join

2014-03-05 Thread java8964
Sorry, my mistake. I didn't pay attention that you are using cross join. Yes, cross join will always use one reducer, at least that is my understand. Yong Date: Wed, 5 Mar 2014 15:27:48 +0100 Subject: Re: Best way to avoid cross join From: darkwoll...@gmail.com To: user@hive.apache.org hey

Re: Best way to avoid cross join

2014-03-05 Thread Nitin Pawar
setting number of reducers will not help normally unless there are those many keys for reducers. even if it launches those many reducers, it may just happen that most of them just wont get any data. can you share how many different ids are there and whats the data sizes in rows? On Wed, Mar 5, 2

Re: Best way to avoid cross join

2014-03-05 Thread fab wol
hey Yong, Even without the group by (pure cross join) the query is only using one reducer. Even specifying more reducers doesn't help: set mapred.reduce.tasks=50; SELECT id1, m.keyword, prep_kw.keyword FROM (select id1, keyword from import1) m CROSS JOIN (SELECT keyword FROM et_ke

RE: Best way to avoid cross join

2014-03-05 Thread java8964
Hi, Wolli: Cross join doesn't mean Hive has to use one reduce. >From query point of view, the following cases will use one reducer: 1) Order by in your query (Instead of using sort by)2) Only one reducer group, which means all the data have to send to one reducer, as there is only one reducer gro