Re: Error joining dataframes

ram kumar Wed, 18 May 2016 01:30:12 -0700

I tried it,

eg:
 df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter")


+----+----+----+----+

|  id|   A|  id|   B|

+----+----+----+----+

|   1|   0|null|null|

|   2|   0|   2|   0|

|null|null|   3|   0|

+----+----+----+----+


if I try,
df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter").drop(df1("Id"))



+----+----+----+

|   A|  id|   B|

+----+----+----+

|   0|null|null|

|   0|   2|   0|

|null|   3|   0|

+----+----+----+

The "id" = 1 will be lost

On Wed, May 18, 2016 at 1:52 PM, Divya Gehlot <divya.htco...@gmail.com>
wrote:

> Can you try var df_join = df1.join(df2,df1( "Id") ===df2("Id"),
> "fullouter").drop(df1("Id"))
> On May 18, 2016 2:16 PM, "ram kumar" <ramkumarro...@gmail.com> wrote:
>
> I tried
>
> scala> var df_join = df1.join(df2, "Id", "fullouter")
> <console>:27: error: type mismatch;
>  found   : String("Id")
>  required: org.apache.spark.sql.Column
>        var df_join = df1.join(df2, "Id", "fullouter")
>                                    ^
>
> scala>
>
> And I cant see the above method in
>
> https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html#join(org.apache.spark.sql.DataFrame,%20org.apache.spark.sql.Column,%20java.lang.String)
>
> On Wed, May 18, 2016 at 2:22 AM, Bijay Kumar Pathak <bkpat...@mtu.edu>
> wrote:
>
>> Hi,
>>
>> Try this one:
>>
>>
>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>
>> Thanks,
>> Bijay
>>
>>
>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarro...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I tried to join two dataframe
>>>
>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>
>>> df_join.registerTempTable("join_test")
>>>
>>>
>>> When querying "Id" from "join_test"
>>>
>>> 0: jdbc:hive2://> *select Id from join_test;*
>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>> 0: jdbc:hive2://>
>>>
>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>> "Id"
>>>
>>> Thanks
>>>
>>
>>
>

Re: Error joining dataframes

Reply via email to