Ah, yes. `df_join` has the two `id`, so you need to select which id you use;
scala> :paste // Entering paste mode (ctrl-D to finish) val df1 = Seq((1, 0), (2, 0)).toDF("id", "A") val df2 = Seq((2, 0), (3, 0)).toDF("id", "B") val df3 = df1.join(df2, df1("id") === df2("id"), "outer") df3.printSchema df3.select(df1("id")).show // Exiting paste mode, now interpreting. root |-- id: integer (nullable = true) |-- A: integer (nullable = true) |-- id: integer (nullable = true) |-- B: integer (nullable = true) +----+ | id| +----+ | 1| | 2| |null| +----+ On Wed, May 18, 2016 at 4:29 PM, ram kumar <ramkumarro...@gmail.com> wrote: > When you register a temp table from the dataframe > > eg: > var df_join = df1.join(df2, df1("id") === df2("id"), "outer") > df_join.registerTempTable("test") > > sqlContext.sql("select * from test") > > +----+----+----+----+ > > | id| A| id| B| > > +----+----+----+----+ > > | 1| 0|null|null| > > | 2| 0| 2| 0| > > |null|null| 3| 0| > > +----+----+----+----+ > > > but, when you query the "id" > > > sqlContext.sql("select id from test") > > *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is > *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) > > On Wed, May 18, 2016 at 12:44 PM, Takeshi Yamamuro <linguin....@gmail.com> > wrote: > >> Look weird, seems spark-v1.5.x can accept the query. >> What's the difference between the example and your query? >> >> #### >> >> Welcome to >> >> ____ __ >> >> / __/__ ___ _____/ /__ >> >> _\ \/ _ \/ _ `/ __/ '_/ >> >> /___/ .__/\_,_/_/ /_/\_\ version 1.5.2 >> >> /_/ >> >> scala> :paste >> >> // Entering paste mode (ctrl-D to finish) >> >> val df1 = Seq((1, 0), (2, 0)).toDF("id", "A") >> >> val df2 = Seq((2, 0), (3, 0)).toDF("id", "B") >> >> df1.join(df2, df1("id") === df2("id"), "outer").show >> >> >> // Exiting paste mode, now interpreting. >> >> >> +----+----+----+----+ >> >> | id| A| id| B| >> >> +----+----+----+----+ >> >> | 1| 0|null|null| >> >> | 2| 0| 2| 0| >> >> |null|null| 3| 0| >> >> +----+----+----+----+ >> >> >> df1: org.apache.spark.sql.DataFrame = [id: int, A: int] >> >> df2: org.apache.spark.sql.DataFrame = [id: int, B: int] >> >> >> >> >> >> On Wed, May 18, 2016 at 3:52 PM, ram kumar <ramkumarro...@gmail.com> >> wrote: >> >>> I tried >>> df1.join(df2, df1("id") === df2("id"), "outer").show >>> >>> But there is a duplicate "id" and when I query the "id", I get >>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is >>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) >>> >>> I am currently using spark 1.5.2. >>> Is there any alternative way in 1.5 >>> >>> Thanks >>> >>> On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro < >>> linguin....@gmail.com> wrote: >>> >>>> Also, you can pass the query that you'd like to use in spark-v1.6+; >>>> >>>> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A") >>>> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B") >>>> df1.join(df2, df1("id") === df2("id"), "outer").show >>>> >>>> // maropu >>>> >>>> >>>> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ramkumarro...@gmail.com> >>>> wrote: >>>> >>>>> If I run as >>>>> val rs = s.join(t,"time_id").join(c,"channel_id") >>>>> >>>>> It takes as inner join. >>>>> >>>>> >>>>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>>> pretty simple, a similar construct to tables projected as DF >>>>>> >>>>>> val c = >>>>>> HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC") >>>>>> val t = >>>>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC") >>>>>> val rs = s.join(t,"time_id").join(c,"channel_id") >>>>>> >>>>>> HTH >>>>>> >>>>>> Dr Mich Talebzadeh >>>>>> >>>>>> >>>>>> >>>>>> LinkedIn * >>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>> >>>>>> >>>>>> >>>>>> http://talebzadehmich.wordpress.com >>>>>> >>>>>> >>>>>> >>>>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bkpat...@mtu.edu> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Try this one: >>>>>>> >>>>>>> >>>>>>> df_join = df1.*join*(df2, 'Id', "fullouter") >>>>>>> >>>>>>> Thanks, >>>>>>> Bijay >>>>>>> >>>>>>> >>>>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarro...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I tried to join two dataframe >>>>>>>> >>>>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter") >>>>>>>> >>>>>>>> df_join.registerTempTable("join_test") >>>>>>>> >>>>>>>> >>>>>>>> When querying "Id" from "join_test" >>>>>>>> >>>>>>>> 0: jdbc:hive2://> *select Id from join_test;* >>>>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is >>>>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 >>>>>>>> (state=,code=0) >>>>>>>> 0: jdbc:hive2://> >>>>>>>> >>>>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into >>>>>>>> one "Id" >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> --- >>>> Takeshi Yamamuro >>>> >>> >>> >> >> >> -- >> --- >> Takeshi Yamamuro >> > > -- --- Takeshi Yamamuro