I tried it,
eg:
df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter")
+++++
| id| A| id| B|
+++++
| 1| 0|null|null|
| 2| 0| 2| 0|
|null|null| 3| 0|
+++++
if I try,
df_join = df1.join(df2,df1( "Id") ===df2("Id"),
Can you try var df_join = df1.join(df2,df1( "Id") ===df2("Id"),
"fullouter").drop(df1("Id"))
On May 18, 2016 2:16 PM, "ram kumar" wrote:
I tried
scala> var df_join = df1.join(df2, "Id", "fullouter")
:27: error: type mismatch;
found : String("Id")
required: org.apache.spark.sql.Column
Ah, yes. `df_join` has the two `id`, so you need to select which id you use;
scala> :paste
// Entering paste mode (ctrl-D to finish)
val df1 = Seq((1, 0), (2, 0)).toDF("id", "A")
val df2 = Seq((2, 0), (3, 0)).toDF("id", "B")
val df3 = df1.join(df2, df1("id") === df2("id"), "outer")
df3.print
When you register a temp table from the dataframe
eg:
var df_join = df1.join(df2, df1("id") === df2("id"), "outer")
df_join.registerTempTable("test")
sqlContext.sql("select * from test")
+++++
| id| A| id| B|
+++++
| 1| 0|null|null|
| 2| 0| 2|
Look weird, seems spark-v1.5.x can accept the query.
What's the difference between the example and your query?
Welcome to
__
/ __/__ ___ _/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.5.2
/_/
scala> :paste
// Enterin
I tried
df1.join(df2, df1("id") === df2("id"), "outer").show
But there is a duplicate "id" and when I query the "id", I get
*Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
*ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
I am currently using spark 1.5.2.
Is
Also, you can pass the query that you'd like to use in spark-v1.6+;
val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
df1.join(df2, df1("id") === df2("id"), "outer").show
// maropu
On Wed, May 18, 2016 at 3:29 PM, ram kumar wrote:
> If
If I run as
val rs = s.join(t,"time_id").join(c,"channel_id")
It takes as inner join.
On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh
wrote:
> pretty simple, a similar construct to tables projected as DF
>
> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
> val t = H
You can use the api in spark-v1.6+.
https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L454
// maropu
On Wed, May 18, 2016 at 3:16 PM, ram kumar wrote:
> I tried
>
> scala> var df_join = df1.join(df2, "Id", "fullouter")
> :27: error: typ
I tried
scala> var df_join = df1.join(df2, "Id", "fullouter")
:27: error: type mismatch;
found : String("Id")
required: org.apache.spark.sql.Column
var df_join = df1.join(df2, "Id", "fullouter")
^
scala>
And I cant see the above method in
https://spa
pretty simple, a similar construct to tables projected as DF
val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
val rs = s.join(t,"time_id").join(c,"channel_id")
HTH
Dr Mich Talebzadeh
LinkedIn *
Hi,
Try this one:
df_join = df1.*join*(df2, 'Id', "fullouter")
Thanks,
Bijay
On Tue, May 17, 2016 at 9:39 AM, ram kumar wrote:
> Hi,
>
> I tried to join two dataframe
>
> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>
> df_join.registerTempTable("join_test")
>
>
> When
Hi,
I tried to join two dataframe
df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
df_join.registerTempTable("join_test")
When querying "Id" from "join_test"
0: jdbc:hive2://> *select Id from join_test;*
*Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
*amb
13 matches
Mail list logo