Re: Merge two dataframes

Sean Owen Wed, 12 May 2021 09:07:35 -0700

Yeah I don't think that's going to work - you aren't guaranteed to get 1,
2, 3, etc. I think row_number() might be what you need to generate a join
ID.


RDD has a .zip method, but (unless I'm forgetting!) DataFrame does not. You
could .zip two RDDs you get from DataFrames and manually convert the Rows
back to a single Row and back to DataFrame.


On Wed, May 12, 2021 at 10:47 AM kushagra deep <kushagra94d...@gmail.com>
wrote:

> Thanks Raghvendra
>
> Will the ids for corresponding columns  be same always ? Since
> monotonic_increasing_id() returns a number based on partitionId and the row
> number of the partition  ,will it be same for corresponding columns? Also
> is it guaranteed that the two dataframes will be divided into logical spark
> partitions with the same cardinality for each partition ?
>
> Reg,
> Kushagra Deep
>
> On Wed, May 12, 2021, 21:00 Raghavendra Ganesh <raghavendr...@gmail.com>
> wrote:
>
>> You can add an extra id column and perform an inner join.
>>
>> val df1_with_id = df1.withColumn("id", monotonically_increasing_id())
>>
>> val df2_with_id = df2.withColumn("id", monotonically_increasing_id())
>>
>> df1_with_id.join(df2_with_id, Seq("id"), "inner").drop("id").show()
>>
>> +---------+---------+
>>
>> |amount_6m|amount_9m|
>>
>> +---------+---------+
>>
>> |      100|      500|
>>
>> |      200|      600|
>>
>> |      300|      700|
>>
>> |      400|      800|
>>
>> |      500|      900|
>>
>> +---------+---------+
>>
>>
>> --
>> Raghavendra
>>
>>
>> On Wed, May 12, 2021 at 6:20 PM kushagra deep <kushagra94d...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I have two dataframes
>>>
>>> df1
>>>
>>> amount_6m
>>>  100
>>>  200
>>>  300
>>>  400
>>>  500
>>>
>>> And a second data df2 below
>>>
>>>  amount_9m
>>>   500
>>>   600
>>>   700
>>>   800
>>>   900
>>>
>>> The number of rows is same in both dataframes.
>>>
>>> Can I merge the two dataframes to achieve below df
>>>
>>> df3
>>>
>>> amount_6m | amount_9m
>>>     100                   500
>>>      200                  600
>>>      300                  700
>>>      400                  800
>>>      500                  900
>>>
>>> Thanks in advance
>>>
>>> Reg,
>>> Kushagra Deep
>>>
>>>

Re: Merge two dataframes

Reply via email to