I try to get the result schema of aggregate functions using DataFrame
API.
However, I find the result field of groupBy columns are always nullable
even the source field is not nullable.
I want to know if this is by design, thank you! Below is the simple code
to show the issue.
======
import sqlContext.implicits._
import org.apache.spark.sql.functions._
case class Test(key: String, value: Long)
val df = sc.makeRDD(Seq(Test("k1",2),Test("k1",1))).toDF
val result = df.groupBy("key").agg($"key", sum("value"))
// From the output, you can see the "key" column is nullable, why??
result.printSchema
// root
// |-- key: string (nullable = true)
// |-- SUM(value): long (nullable = true)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]