Re: Spark SQL : Join operation failure

2017-02-23 Thread neil90
It might be a memory issue. Try adding .persist(MEMORY_AND_DISK_ONLY) so that if the RDD can't fit into memory it will persist parts of the RDD into disk. cm_go.registerTempTable("x") ko.registerTempTable("y") joined_df = sqlCtx.sql("select * from x FULL OUTER JOIN y ON field1=field2") joined_

Re: Spark SQL : Join operation failure

2017-02-22 Thread Yong Zhang
Your error message is not clear about what really happens. Is your container killed by Yarn, or it indeed runs OOM? When I run the spark job with big data, here is normally what I will do: 1) Enable GC output. You need to monitor the GC output in the executor, to understand the GC pressure. If