Thanks Raghvendra Will the ids for corresponding columns be same always ? Since monotonic_increasing_id() returns a number based on partitionId and the row number of the partition ,will it be same for corresponding columns? Also is it guaranteed that the two dataframes will be divided into logical spark partitions with the same cardinality for each partition ?
Reg, Kushagra Deep On Wed, May 12, 2021, 21:00 Raghavendra Ganesh <[email protected]> wrote: > You can add an extra id column and perform an inner join. > > val df1_with_id = df1.withColumn("id", monotonically_increasing_id()) > > val df2_with_id = df2.withColumn("id", monotonically_increasing_id()) > > df1_with_id.join(df2_with_id, Seq("id"), "inner").drop("id").show() > > +---------+---------+ > > |amount_6m|amount_9m| > > +---------+---------+ > > | 100| 500| > > | 200| 600| > > | 300| 700| > > | 400| 800| > > | 500| 900| > > +---------+---------+ > > > -- > Raghavendra > > > On Wed, May 12, 2021 at 6:20 PM kushagra deep <[email protected]> > wrote: > >> Hi All, >> >> I have two dataframes >> >> df1 >> >> amount_6m >> 100 >> 200 >> 300 >> 400 >> 500 >> >> And a second data df2 below >> >> amount_9m >> 500 >> 600 >> 700 >> 800 >> 900 >> >> The number of rows is same in both dataframes. >> >> Can I merge the two dataframes to achieve below df >> >> df3 >> >> amount_6m | amount_9m >> 100 500 >> 200 600 >> 300 700 >> 400 800 >> 500 900 >> >> Thanks in advance >> >> Reg, >> Kushagra Deep >> >>
