bug? using withColumn with colName with dot can't replace column

Emmanuel Tue, 15 Mar 2016 10:52:18 -0700

In Spark 1.6
if I do (column name has dot in it, but is not a nested column):
df = df.withColumn("raw.hourOfDay", df.col("`raw.hourOfDay`"))scala> df = 
df.withColumn("raw.hourOfDay", 
df.col("`raw.hourOfDay`"))org.apache.spark.sql.AnalysisException: cannot 
resolve 'raw.minOfDay' given input columns raw.hourOfDay_2, raw.dayOfWeek, 
raw.sensor2, raw.hourOfDay, raw.minOfDay;        at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:319)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:319)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:53)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:318)    
    at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:107)
        at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:117)
        at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:121)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.immutable.List.foreach(List.scala:318)        at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)        at 
scala.collection.AbstractTraversable.map(Traversable.scala:105)        at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:121)
        at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:125)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)        
at scala.collection.Iterator$class.foreach(Iterator.scala:727)        at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)        at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)        
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
but if I do:
df = df.withColumn("raw.hourOfDay_2", df.col("`raw.hourOfDay`"))scala> 
df.printSchema
root
 |-- raw.hourOfDay: long (nullable = true)
 |-- raw.minOfDay: long (nullable = true)
 |-- raw.dayOfWeek: long (nullable = true)
 |-- raw.sensor2: long (nullable = true)
 |-- raw.hourOfDay_2: long (nullable = true)
it works fine (i.e. column is created).
The only difference is that the name "raw.hourOfDay_2" does not exist yet, and 
is properly created as a colName with dot, not as a nested column.
The documentation however says that if the column exists it will replace it, 
but it seems there is a miss-interpretation of the column name as a nested 
column


defwithColumn(colName: String, col: Column): DataFrameReturns a new DataFrame 
by adding a column or replacing the existing column that has the same name.


Any thoughts on why the different behavior when the column exists?

Thanks

bug? using withColumn with colName with dot can't replace column

Reply via email to