Re: Cross join/cartesian product explanation

2015-11-17 Thread Gopal Vijayaraghavan
>It¹s really a very simple query that I¹m trying to run: >select ... >bloom_contains(a_id, b_id_bloom) That's nearly impossible to optimize directly - there is no way to limit the number of table_b rows which may match table a. More than one bloom filter can successfully match a single row f

Re: Cross join/cartesian product explanation

2015-11-13 Thread Rory Sawyer
Hi Gopal, Thanks for the detailed response. It’s really a very simple query that I’m trying to run: select a.a_id, b.b_id, count(*) as c from table_a a, table_b b where bloom_contains(a_id, b_id_bloom) group by a.a_id, b.b_id; Where “bloom_contains” is a custom U

Re: Cross join/cartesian product explanation

2015-11-10 Thread Gopal Vijayaraghavan
>I¹m having trouble doing a cross join between two tables that are too big >for a map-side join. The actual query would help btw. Usually what is planned as a cross-join can be optimized out into a binning query with a custom UDF. In particular with 2-D geo queries with binning, which

Re: Cross join/cartesian product explanation

2015-11-09 Thread Rory Sawyer
Hi Gopal, Thanks for the speedy response! A follow-up question though: 10Mb input sounds like that would work for a map join. I’m having trouble doing a cross join between two tables that are too big for a map-side join. Trying to break down one table into small enough partitions and then

Re: Cross join/cartesian product explanation

2015-11-06 Thread Gopal Vijayaraghavan
> Over the last few week I¹ve been trying to use cross joins/cartesian >products and was wondering why, exactly, this all gets sent to one >reducer. All I¹ve heard or read is that Hive can¹t/doesn¹t parallelize >the job. The hashcode of the shuffle key is 0, since you need to process every row a

Cross join/cartesian product explanation

2015-11-06 Thread Rory Sawyer
Hi all, Over the last few week I’ve been trying to use cross joins/cartesian products and was wondering why, exactly, this all gets sent to one reducer. All I’ve heard or read is that Hive can’t/doesn’t parallelize the job. Is there some code people can point me to? Does anyone have a workaroun

Re: Best way to avoid cross join

2014-03-19 Thread Nitin Pawar
r tried so you may give it a shot. On Wed, Mar 19, 2014 at 6:00 PM, fab wol wrote: > Hey Nitin, > > Yong wrote exactly the oppsoite in his first sentence: > > *Cross join doesn't mean Hive has to use one reduce.* > > and this super old thread here lets me also assume th

Re: Best way to avoid cross join

2014-03-19 Thread fab wol
Hey Nitin, Yong wrote exactly the oppsoite in his first sentence: *Cross join doesn't mean Hive has to use one reduce.* and this super old thread here lets me also assume that there can be used more than one reducer: http://mail-archives.apache.org/mod_mbox/hive-user/200904

Re: Best way to avoid cross join

2014-03-19 Thread Nitin Pawar
hey Wolli, sorry missed this one. as Yong already replied, cross join always uses only one reducer. If you want to avoid this can you just try it to make full outer join with on condition (1 = 1) ? and see if you get your desired result On Wed, Mar 19, 2014 at 4:05 PM, fab wol wrote

Re: Best way to avoid cross join

2014-03-19 Thread fab wol
contains > roundabout 2000 keywords. So the result of this cross join will be ca. 2.4 > bio rows which need to be checked (see INSTR() function). > > Thx for looking into ... > > Cheers > Wolli > > > 2014-03-05 15:35 GMT+01:00 Nitin Pawar : > >> setting num

Re: Best way to avoid cross join

2014-03-14 Thread fab wol
Hey Nitin, in import1 are at least 1.2 mio rows, with almost the same amount of distinct id's and approxametly 40k distinct keywords. et_keywords contains roundabout 2000 keywords. So the result of this cross join will be ca. 2.4 bio rows which need to be checked (see INSTR() function). Th

RE: Best way to avoid cross join

2014-03-05 Thread java8964
Sorry, my mistake. I didn't pay attention that you are using cross join. Yes, cross join will always use one reducer, at least that is my understand. Yong Date: Wed, 5 Mar 2014 15:27:48 +0100 Subject: Re: Best way to avoid cross join From: darkwoll...@gmail.com To: user@hive.apache.org hey

Re: Best way to avoid cross join

2014-03-05 Thread Nitin Pawar
, 2014 at 7:57 PM, fab wol wrote: > hey Yong, > > Even without the group by (pure cross join) the query is only using one > reducer. Even specifying more reducers doesn't help: > > set mapred.reduce.tasks=50; > SELECT id1, >m.keyword, >prep_kw.keyword

Re: Best way to avoid cross join

2014-03-05 Thread fab wol
hey Yong, Even without the group by (pure cross join) the query is only using one reducer. Even specifying more reducers doesn't help: set mapred.reduce.tasks=50; SELECT id1, m.keyword, prep_kw.keyword FROM (select id1, keyword from import1) m CROSS JOIN (SELECT keyword

RE: Best way to avoid cross join

2014-03-05 Thread java8964
Hi, Wolli: Cross join doesn't mean Hive has to use one reduce. >From query point of view, the following cases will use one reducer: 1) Order by in your query (Instead of using sort by)2) Only one reducer group, which means all the data have to send to one reducer, as there is only one

Best way to avoid cross join

2014-03-05 Thread fab wol
WHEN m.keyword IS NULL THEN 0 WHEN instr(m.keyword, prep_kw.keyword) > 0 THEN 1 ELSE 0 END) AS flag FROM (select id1, keyword from import1) m CROSS JOIN (SELECT keyword FROM et_keywords) prep_kw GROUP BY id1; Since there is a cross join involved, the execution gets pinned down

Re: cross join

2012-12-03 Thread Mark Grover
arounds go, you could try having each table in a sub-query and have an extra virtual column ("1 as one") and join on that virtual column. Mark On Mon, Dec 3, 2012 at 9:26 AM, Periya.Data wrote: > Hi Hive users, > I have Hive CDH - 0.7.1. I want to know if I can do cross-j

Re: Cross join in Hive.

2011-05-02 Thread Raghunath, Ranjith
Very cool. What is the non strict option for? Thanks, Ranjith From: Ashish Thusoo To: Sent: Mon May 02 16:00:39 2011 Subject: Re: Cross join in Hive. you could probably just say (1 = 1) in the on clause for the join. set hive.mapred.mode=nonstrict; select

Re: Cross join in Hive.

2011-05-02 Thread Ashish Thusoo
first table to be equal to the same column in the other table. Thanks, Ranjith From: Raghunath, Ranjith mailto:ranjith.raghuna...@usaa.com>> To: 'user@hive.apache.org' mailto:user@hive.apache.org>> Sent: Mon May 02 00:21:57 2011 Subject: Re:

Re: Cross join in Hive.

2011-05-01 Thread Raghunath, Ranjith
11 Subject: Re: Cross join in Hive. I haven't tested this out but plan to in 6 hours. Add an extra column and set it to 1 in both tables. Perform an inner join between the two tables. Thanks, Ranjith From: Abhinov Agarwal To: user@hive.apache.org Sent: S

Re: Cross join in Hive.

2011-05-01 Thread Raghunath, Ranjith
Cross join in Hive. Hi, I need to take a cross join of a big table with itself, is it possible to do it using Hive ? E.g. Set : 1 2 3 Result : 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 This would also do : 1,2 1,3 2,3 In fact the second one is what I want. I know cross join is not supported in Hive

Cross join in Hive.

2011-05-01 Thread Abhinov Agarwal
Hi, I need to take a cross join of a big table with itself, is it possible to do it using Hive ? E.g. Set : 1 2 3 Result : 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 This would also do : 1,2 1,3 2,3 In fact the second one is what I want. I know cross join is not supported in Hive, any other way to