subject:"Error joining dataframes"

Re: Error joining dataframes

2016-05-18 Thread ram kumar

I tried it, eg: df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter") +++++ | id| A| id| B| +++++ | 1| 0|null|null| | 2| 0| 2| 0| |null|null| 3| 0| +++++ if I try, df_join = df1.join(df2,df1( "Id") ===df2("Id"),

Re: Error joining dataframes

2016-05-18 Thread Divya Gehlot

Can you try var df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter").drop(df1("Id")) On May 18, 2016 2:16 PM, "ram kumar" wrote: I tried scala> var df_join = df1.join(df2, "Id", "fullouter") :27: error: type mismatch; found : String("Id") required: org.apache.spark.sql.Column

Re: Error joining dataframes

2016-05-18 Thread Takeshi Yamamuro

Ah, yes. `df_join` has the two `id`, so you need to select which id you use; scala> :paste // Entering paste mode (ctrl-D to finish) val df1 = Seq((1, 0), (2, 0)).toDF("id", "A") val df2 = Seq((2, 0), (3, 0)).toDF("id", "B") val df3 = df1.join(df2, df1("id") === df2("id"), "outer") df3.print

Re: Error joining dataframes

2016-05-18 Thread ram kumar

When you register a temp table from the dataframe eg: var df_join = df1.join(df2, df1("id") === df2("id"), "outer") df_join.registerTempTable("test") sqlContext.sql("select * from test") +++++ | id| A| id| B| +++++ | 1| 0|null|null| | 2| 0| 2|

Re: Error joining dataframes

2016-05-18 Thread Takeshi Yamamuro

Look weird, seems spark-v1.5.x can accept the query. What's the difference between the example and your query? Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.5.2 /_/ scala> :paste // Enterin

Re: Error joining dataframes

2016-05-17 Thread ram kumar

I tried df1.join(df2, df1("id") === df2("id"), "outer").show But there is a duplicate "id" and when I query the "id", I get *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) I am currently using spark 1.5.2. Is

Re: Error joining dataframes

2016-05-17 Thread Takeshi Yamamuro

Also, you can pass the query that you'd like to use in spark-v1.6+; val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A") val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B") df1.join(df2, df1("id") === df2("id"), "outer").show // maropu On Wed, May 18, 2016 at 3:29 PM, ram kumar wrote: > If

Re: Error joining dataframes

2016-05-17 Thread ram kumar

If I run as val rs = s.join(t,"time_id").join(c,"channel_id") It takes as inner join. On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh wrote: > pretty simple, a similar construct to tables projected as DF > > val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC") > val t = H

Re: Error joining dataframes

2016-05-17 Thread Takeshi Yamamuro

You can use the api in spark-v1.6+. https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L454 // maropu On Wed, May 18, 2016 at 3:16 PM, ram kumar wrote: > I tried > > scala> var df_join = df1.join(df2, "Id", "fullouter") > :27: error: typ

Re: Error joining dataframes

2016-05-17 Thread ram kumar

I tried scala> var df_join = df1.join(df2, "Id", "fullouter") :27: error: type mismatch; found : String("Id") required: org.apache.spark.sql.Column var df_join = df1.join(df2, "Id", "fullouter") ^ scala> And I cant see the above method in https://spa

Re: Error joining dataframes

2016-05-17 Thread Mich Talebzadeh

pretty simple, a similar construct to tables projected as DF val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC") val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC") val rs = s.join(t,"time_id").join(c,"channel_id") HTH Dr Mich Talebzadeh LinkedIn *

Re: Error joining dataframes

2016-05-17 Thread Bijay Kumar Pathak

Hi, Try this one: df_join = df1.*join*(df2, 'Id', "fullouter") Thanks, Bijay On Tue, May 17, 2016 at 9:39 AM, ram kumar wrote: > Hi, > > I tried to join two dataframe > > df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter") > > df_join.registerTempTable("join_test") > > > When

Error joining dataframes

2016-05-17 Thread ram kumar

Hi, I tried to join two dataframe df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter") df_join.registerTempTable("join_test") When querying "Id" from "join_test" 0: jdbc:hive2://> *select Id from join_test;* *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is *amb

Re: Error joining dataframes

Re: Error joining dataframes

Re: Error joining dataframes

Re: Error joining dataframes

Re: Error joining dataframes

Re: Error joining dataframes

Re: Error joining dataframes

Re: Error joining dataframes

Re: Error joining dataframes

Re: Error joining dataframes

Re: Error joining dataframes

Re: Error joining dataframes

Error joining dataframes

13 matches

Site Navigation

Mail list logo

Footer information