Hello,
I would like ask know if there are recommended ways of preventing ambiguous
columns when joining dataframes. When we join dataframes, it usually happen
we join the column with identical name. I could have rename the columns on
the right data frame, as described in the following code. Is there a better
way to achieve this?
scala> val df = sqlContext.createDataFrame(Seq((1, "a"), (2, "b"), (3,
"b"), (4, "b")))
df: org.apache.spark.sql.DataFrame = [_1: int, _2: string]
scala> val df2 = sqlContext.createDataFrame(Seq((1, 10), (2, 20), (3, 30),
(4, 40)))
df2: org.apache.spark.sql.DataFrame = [_1: int, _2: int]
scala> df.join(df2.withColumnRenamed("_1", "right_key"), $"_1" ===
$"right_key").printSchema
Thanks.
Justin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Best-practice-to-avoid-ambiguous-columns-in-DataFrame-join-tp22907.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.