RE: ShuffledHashJoin Possible Issue

Cheng, Hao Sun, 18 Oct 2015 19:23:57 -0700

Hi Gsvic, Can you please provide detail code / steps to reproduce that?

Hao

-----Original Message-----
From: gsvic [mailto:victora...@gmail.com] 
Sent: Monday, October 19, 2015 3:55 AM
To: dev@spark.apache.org
Subject: ShuffledHashJoin Possible Issue

I am doing some experiments with join algorithms in SparkSQL and I am facing 
the following issue:

I have costructed two "dummy" json tables, t1.json and t2.json. Each of them 
has two columns, ID and Value. The ID is an incremental integer(unique) and the 
Value a random value. I am running an equi-join query on ID attribute.
In case of SortMerge and BroadcastHashJoin algorithms, the return result is 
correct but in case of ShuffledHashJoin the count aggregate returns always 
zero. The correct result is t2, as t2.ID is a subset of t1.ID.

The query is *t1.join(t2).where(t1("ID").equalTo(t2("ID")))*

--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/ShuffledHashJoin-Possible-Issue-tp14672.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional 
commands, e-mail: dev-h...@spark.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

RE: ShuffledHashJoin Possible Issue

Reply via email to