I think your thread dump for the master is actually just a thread dump for
SBT that is waiting on a forked driver program.
...
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x7fed624ff528> (a java.lang.UNIXProcess)
at java.lang.Obj
> Finally, it works with RDD.join. What we have done is basically
>> transforming
>> 2 tables into 2 pair RDDs, then calling a join operation. It works great
>> in
>> about 500 s.
>>
>> However, workaround is just a workaround, since we have to transform
> rdd2("receipt_id"))
> rddJoin.saveAsTable("testJoinTable", SaveMode.Overwrite)
>
> RDD workaround in this case is a bit cumbersome, for short, we just created
> 2 RDDs, join, and then apply a new schema on the result RDD. This approach
> works, at least all ta
t created
2 RDDs, join, and then apply a new schema on the result RDD. This approach
works, at least all tasks were finished, while the DF/SQL approach don't.
Any idea ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/The-differentce-between-SparkSql-Data