[SparkSQL 1.4.0] groupBy columns are always nullable?

Haopu Wang Mon, 11 May 2015 01:49:30 -0700

I try to get the result schema of aggregate functions using DataFrame
API.

However, I find the result field of groupBy columns are always nullable
even the source field is not nullable.


I want to know if this is by design, thank you! Below is the simple code
to show the issue.

======

  import sqlContext.implicits._
  import org.apache.spark.sql.functions._
  case class Test(key: String, value: Long)
  val df = sc.makeRDD(Seq(Test("k1",2),Test("k1",1))).toDF
  
  val result = df.groupBy("key").agg($"key", sum("value"))
  
  // From the output, you can see the "key" column is nullable, why??
  result.printSchema
//    root
//     |-- key: string (nullable = true)
//     |-- SUM(value): long (nullable = true)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[SparkSQL 1.4.0] groupBy columns are always nullable?

Reply via email to