Re: Mapper side join with DataFrames API

2016-03-05 Thread Deepak Gopalakrishnan
ut how to do the map side join in Spark. >>> >>> In 1.5.x, there is a broadcast function in the Dataframe, and it caused >>> OOM for me simple test case, even one side of join is very small. >>> >>> I am still trying to find out the root cause yet. &g

Re: Mapper side join with DataFrames API

2016-03-04 Thread Deepak Gopalakrishnan
t the root cause yet. >> >> Yong >> >> ------ >> Date: Wed, 2 Mar 2016 15:38:29 +0530 >> Subject: Re: Mapper side join with DataFrames API >> From: dgk...@gmail.com >> To: mich...@databricks.com >> CC: u...@spark.apache

Re: Mapper side join with DataFrames API

2016-02-29 Thread Deepak Gopalakrishnan
Hello All, Just to add to this question a bit more context I have a join as stated above and I see in my executor logs the below : 16/02/29 17:02:35 INFO TaskSetManager: Finished task 198.0 in stage 7.0 (TID 1114) in 20354 ms on localhost (196/200) 16/02/29 17:02:35 INFO ShuffleBlockFetcher

Mapper side join with DataFrames API

2016-02-29 Thread Deepak Gopalakrishnan
Hello, I'm trying to join 2 dataframes A and B with a sqlContext.sql("SELECT * FROM A INNER JOIN B ON A.a=B.a"); Now what I have done is that I have registeredTempTables for A and B after loading these DataFrames from different sources. I need the join to be really fast and I was wondering if th