I tried df1.join(df2, df1("id") === df2("id"), "outer").show
But there is a duplicate "id" and when I query the "id", I get *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) I am currently using spark 1.5.2. Is there any alternative way in 1.5 Thanks On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <linguin....@gmail.com> wrote: > Also, you can pass the query that you'd like to use in spark-v1.6+; > > val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A") > val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B") > df1.join(df2, df1("id") === df2("id"), "outer").show > > // maropu > > > On Wed, May 18, 2016 at 3:29 PM, ram kumar <ramkumarro...@gmail.com> > wrote: > >> If I run as >> val rs = s.join(t,"time_id").join(c,"channel_id") >> >> It takes as inner join. >> >> >> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> pretty simple, a similar construct to tables projected as DF >>> >>> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC") >>> val t = >>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC") >>> val rs = s.join(t,"time_id").join(c,"channel_id") >>> >>> HTH >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bkpat...@mtu.edu> wrote: >>> >>>> Hi, >>>> >>>> Try this one: >>>> >>>> >>>> df_join = df1.*join*(df2, 'Id', "fullouter") >>>> >>>> Thanks, >>>> Bijay >>>> >>>> >>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarro...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I tried to join two dataframe >>>>> >>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter") >>>>> >>>>> df_join.registerTempTable("join_test") >>>>> >>>>> >>>>> When querying "Id" from "join_test" >>>>> >>>>> 0: jdbc:hive2://> *select Id from join_test;* >>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is >>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) >>>>> 0: jdbc:hive2://> >>>>> >>>>> Is there a way to merge the value of df1("Id") and df2("Id") into one >>>>> "Id" >>>>> >>>>> Thanks >>>>> >>>> >>>> >>> >> > > > -- > --- > Takeshi Yamamuro >