Hello, thank you for the response. I found a blog where a guy explains that it is not possible to join columns from different data frames.
I was trying to modify one column’s information, so selecting it and then trying to replace the original dataframe column. Found another way, Thanks Saif From: Silvio Fiorito [mailto:silvio.fior...@granturing.com] Sent: Wednesday, August 26, 2015 8:54 PM To: Ellafi, Saif A.; user@spark.apache.org Subject: Re: Help! Stuck using withColumn Hi Saif, In both cases you’re referencing columns that don’t exist in the current DataFrame. The first email you did a select and then a withColumn for ‘month_date_cur' on the resulting DF, but that column does not exist, because you did a select for only ‘month_balance’. In the second email you’re using 2 different DFs and trying to select a column from one in a withColumn on the other, that just wouldn’t work. Also, there’s no explicit column names given to either DF, so that column doesn’t exist. Did you intend to do a join instead? Thanks, Silvio From: "saif.a.ell...@wellsfargo.com<mailto:saif.a.ell...@wellsfargo.com>" Date: Wednesday, August 26, 2015 at 6:06 PM To: "saif.a.ell...@wellsfargo.com<mailto:saif.a.ell...@wellsfargo.com>", "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: RE: Help! Stuck using withColumn I can reproduce this even simpler with the following: val gf = sc.parallelize(Array(3,6,4,7,3,4,5,5,31,4,5,2)).toDF("ASD") val ff = sc.parallelize(Array(4,6,2,3,5,1,4,6,23,6,4,7)).toDF("GFD") gf.withColumn("DSA", ff.col("GFD")) org.apache.spark.sql.AnalysisException: resolved attribute(s) GFD#421 missing from ASD#419 in operator !Project [ASD#419,GFD#421 AS DSA#422]; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:121) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:98) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:42) at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154) at org.apache.spark.sql.DataFrame.select(DataFrame.scala:595) at org.apache.spark.sql.DataFrame.withColumn(DataFrame.scala:1039) From: saif.a.ell...@wellsfargo.com<mailto:saif.a.ell...@wellsfargo.com> [mailto:saif.a.ell...@wellsfargo.com] Sent: Wednesday, August 26, 2015 6:47 PM To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Help! Stuck using withColumn This simple comand call: val final_df = data.select("month_balance").withColumn("month_date", data.col("month_date_curr")) Is throwing: org.apache.spark.sql.AnalysisException: resolved attribute(s) month_date_curr#324 missing from month_balance#234 in operator !Project [month_balance#234, month_date_curr#324 AS month_date_curr#408]; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:121) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:98) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:42) at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154) at org.apache.spark.sql.DataFrame.select(DataFrame.scala:595) at org.apache.spark.sql.DataFrame.withColumn(DataFrame.scala:1039)