Thanks Raghvendra

Will the ids for corresponding columns  be same always ? Since
monotonic_increasing_id() returns a number based on partitionId and the row
number of the partition  ,will it be same for corresponding columns? Also
is it guaranteed that the two dataframes will be divided into logical spark
partitions with the same cardinality for each partition ?

Reg,
Kushagra Deep

On Wed, May 12, 2021, 21:00 Raghavendra Ganesh <[email protected]>
wrote:

> You can add an extra id column and perform an inner join.
>
> val df1_with_id = df1.withColumn("id", monotonically_increasing_id())
>
> val df2_with_id = df2.withColumn("id", monotonically_increasing_id())
>
> df1_with_id.join(df2_with_id, Seq("id"), "inner").drop("id").show()
>
> +---------+---------+
>
> |amount_6m|amount_9m|
>
> +---------+---------+
>
> |      100|      500|
>
> |      200|      600|
>
> |      300|      700|
>
> |      400|      800|
>
> |      500|      900|
>
> +---------+---------+
>
>
> --
> Raghavendra
>
>
> On Wed, May 12, 2021 at 6:20 PM kushagra deep <[email protected]>
> wrote:
>
>> Hi All,
>>
>> I have two dataframes
>>
>> df1
>>
>> amount_6m
>>  100
>>  200
>>  300
>>  400
>>  500
>>
>> And a second data df2 below
>>
>>  amount_9m
>>   500
>>   600
>>   700
>>   800
>>   900
>>
>> The number of rows is same in both dataframes.
>>
>> Can I merge the two dataframes to achieve below df
>>
>> df3
>>
>> amount_6m | amount_9m
>>     100                   500
>>      200                  600
>>      300                  700
>>      400                  800
>>      500                  900
>>
>> Thanks in advance
>>
>> Reg,
>> Kushagra Deep
>>
>>

Reply via email to