Re: The differentce between SparkSql/DataFram join and Rdd join

2015-04-08 Thread Michael Armbrust
I think your thread dump for the master is actually just a thread dump for SBT that is waiting on a forked driver program. ... java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x7fed624ff528> (a java.lang.UNIXProcess) at java.lang.Obj

Re: The differentce between SparkSql/DataFram join and Rdd join

2015-04-08 Thread Hao Ren
> Finally, it works with RDD.join. What we have done is basically >> transforming >> 2 tables into 2 pair RDDs, then calling a join operation. It works great >> in >> about 500 s. >> >> However, workaround is just a workaround, since we have to transform

Re: The differentce between SparkSql/DataFram join and Rdd join

2015-04-07 Thread Michael Armbrust
> rdd2("receipt_id")) > rddJoin.saveAsTable("testJoinTable", SaveMode.Overwrite) > > RDD workaround in this case is a bit cumbersome, for short, we just created > 2 RDDs, join, and then apply a new schema on the result RDD. This approach > works, at least all ta

The differentce between SparkSql/DataFram join and Rdd join

2015-04-07 Thread Hao Ren
t created 2 RDDs, join, and then apply a new schema on the result RDD. This approach works, at least all tasks were finished, while the DF/SQL approach don't. Any idea ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-differentce-between-SparkSql-Data