The picture is a bit hard to read.
I did a brief search but haven't found JIRA for this issue.
Consider logging a SPARK JIRA.
Cheers
On Fri, Dec 18, 2015 at 4:37 AM, Gourav Sengupta
wrote:
> Hi,
>
> the attached DAG shows that for the same table (self join) SPARK is
> unnecessarily getting da
Hi,
the attached DAG shows that for the same table (self join) SPARK is
unnecessarily getting data from S3 for one side of the join where as its
able to use cache for the other side.
Regards,
Gourav
On Fri, Dec 18, 2015 at 10:29 AM, Gourav Sengupta wrote:
> Hi,
>
> I have a table which is dir
Hi,
I have a table which is directly from S3 location and even a self join on
that cached table is causing the data to be read from S3 again.
The query plan in mentioned below:
== Parsed Logical Plan ==
Aggregate [count(1) AS count#1804L]
Project [user#0,programme_key#515]
Join Inner, Some((p
hi,
I think that people have reported the same issue elsewhere, and this should
be registered as a bug in SPARK
https://forums.databricks.com/questions/2142/self-join-in-spark-sql.html
Regards,
Gourav
On Thu, Dec 17, 2015 at 10:52 AM, Gourav Sengupta wrote:
> Hi Ted,
>
> The self join works
Hi Ted,
The self join works fine on tbales where the hivecontext tables are direct
hive tables, therefore
table1 = hiveContext.sql("select columnA, columnB from hivetable1")
table1.registerTempTable("table1")
table1.cache()
table1.count()
and if I do a self join on table1 things are quite fine
I did the following exercise in spark-shell ("c" is cached table):
scala> sqlContext.sql("select x.b from c x join c y on x.a = y.a").explain
== Physical Plan ==
Project [b#4]
+- BroadcastHashJoin [a#3], [a#125], BuildRight
:- InMemoryColumnarTableScan [b#4,a#3], InMemoryRelation [a#3,b#4,c#5],
Hi,
This is how the data can be created:
1. TableA : cached()
2. TableB : cached()
3. TableC: TableA inner join TableB cached()
4. TableC join TableC does not take the data from cache but starts reading
the data for TableA and TableB from disk.
Does this sound like a bug? The self join between