Yeah I don't think that's going to work - you aren't guaranteed to get 1, 2, 3, etc. I think row_number() might be what you need to generate a join ID.
RDD has a .zip method, but (unless I'm forgetting!) DataFrame does not. You could .zip two RDDs you get from DataFrames and manually convert the Rows back to a single Row and back to DataFrame. On Wed, May 12, 2021 at 10:47 AM kushagra deep <kushagra94d...@gmail.com> wrote: > Thanks Raghvendra > > Will the ids for corresponding columns be same always ? Since > monotonic_increasing_id() returns a number based on partitionId and the row > number of the partition ,will it be same for corresponding columns? Also > is it guaranteed that the two dataframes will be divided into logical spark > partitions with the same cardinality for each partition ? > > Reg, > Kushagra Deep > > On Wed, May 12, 2021, 21:00 Raghavendra Ganesh <raghavendr...@gmail.com> > wrote: > >> You can add an extra id column and perform an inner join. >> >> val df1_with_id = df1.withColumn("id", monotonically_increasing_id()) >> >> val df2_with_id = df2.withColumn("id", monotonically_increasing_id()) >> >> df1_with_id.join(df2_with_id, Seq("id"), "inner").drop("id").show() >> >> +---------+---------+ >> >> |amount_6m|amount_9m| >> >> +---------+---------+ >> >> | 100| 500| >> >> | 200| 600| >> >> | 300| 700| >> >> | 400| 800| >> >> | 500| 900| >> >> +---------+---------+ >> >> >> -- >> Raghavendra >> >> >> On Wed, May 12, 2021 at 6:20 PM kushagra deep <kushagra94d...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> I have two dataframes >>> >>> df1 >>> >>> amount_6m >>> 100 >>> 200 >>> 300 >>> 400 >>> 500 >>> >>> And a second data df2 below >>> >>> amount_9m >>> 500 >>> 600 >>> 700 >>> 800 >>> 900 >>> >>> The number of rows is same in both dataframes. >>> >>> Can I merge the two dataframes to achieve below df >>> >>> df3 >>> >>> amount_6m | amount_9m >>> 100 500 >>> 200 600 >>> 300 700 >>> 400 800 >>> 500 900 >>> >>> Thanks in advance >>> >>> Reg, >>> Kushagra Deep >>> >>>