Re: Question regarding join with multiple columns with pyspark

2015-08-13 Thread Dan LaBar
The DataFrame issue has been fixed in Spark 1.5. Refer to SPARK-7990 and Stackoverflow: Spark specify multiple column conditions for dataframe join . On Tue, Apr 28, 2015 at 12:55 PM, Ali Bajwa wrote:

Re: Question regarding join with multiple columns with pyspark

2015-04-28 Thread Ali Bajwa
Thanks again Ayan! To close the loop on this issue, I have filed the below JIRA to track the issue: https://issues.apache.org/jira/browse/SPARK-7197 On Fri, Apr 24, 2015 at 8:21 PM, ayan guha wrote: > I just tested, your observation in DataFrame API is correct. It behaves > weirdly in case of

Re: Question regarding join with multiple columns with pyspark

2015-04-24 Thread ayan guha
I just tested your pr On 25 Apr 2015 10:18, "Ali Bajwa" wrote: > Any ideas on this? Any sample code to join 2 data frames on two columns? > > Thanks > Ali > > On Apr 23, 2015, at 1:05 PM, Ali Bajwa wrote: > > > Hi experts, > > > > Sorry if this is a n00b question or has already been answered...

Re: Question regarding join with multiple columns with pyspark

2015-04-24 Thread ayan guha
I just tested, your observation in DataFrame API is correct. It behaves weirdly in case of multiple column join. (Maybe we should report a Jira?) Solution: You can go back to our good old composite key field concatenation method. Not ideal, but workaround. (Of course you can use realSQL as well,

Re: Question regarding join with multiple columns with pyspark

2015-04-24 Thread Ali Bajwa
Any ideas on this? Any sample code to join 2 data frames on two columns? Thanks Ali On Apr 23, 2015, at 1:05 PM, Ali Bajwa wrote: > Hi experts, > > Sorry if this is a n00b question or has already been answered... > > Am trying to use the data frames API in python to join 2 dataframes > with mor

Question regarding join with multiple columns with pyspark

2015-04-23 Thread Ali Bajwa
Hi experts, Sorry if this is a n00b question or has already been answered... Am trying to use the data frames API in python to join 2 dataframes with more than 1 column. The example I've seen in the documentation only shows a single column - so I tried this: Example code import pandas a