Hi, I want to rename an aggregation field using DataFrame API. The
aggregation is done on a nested field. But I got below exception.
Do you see the same issue and any workaround? Thank you very much!
======
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Cannot resolve column name "SUM('p.q)" among (k, SUM('p.q));
at
org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:
162)
at
org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:
162)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:161)
at org.apache.spark.sql.DataFrame.col(DataFrame.scala:436)
at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:426)
at
org.apache.spark.sql.DataFrame$$anonfun$3.apply(DataFrame.scala:244)
at
org.apache.spark.sql.DataFrame$$anonfun$3.apply(DataFrame.scala:243)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.sc
ala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.sc
ala:244)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.s
cala:33)
at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.sql.DataFrame.toDF(DataFrame.scala:243)
======
And this code can be used to reproduce the issue:
case class ChildClass(q: Long)
case class ParentClass(k: String, p: ChildClass)
def main(args: Array[String]): Unit = {
val conf = new
SparkConf().setAppName("DFTest").setMaster("local[*]")
val ctx = new SparkContext(conf)
val sqlCtx = new HiveContext(ctx)
import sqlCtx.implicits._
val source = ctx.makeRDD(Seq(ParentClass("c1",
ChildClass(100)))).toDF
import org.apache.spark.sql.functions._
val target = source.groupBy('k).agg('k, sum("p.q"))
// This line prints the correct contents
// k SUM('p.q)
// c1 100
target.show
// But this line triggers the exception
target.toDF("key", "total")
======