It seems when doing a union on a DF where one DF contains lit(null) or null for a String, causes a: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.String
when doing getString(i) on a Row within forEachPartition. Stack: Caused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.String at org.apache.spark.sql.Row$class.getString(Row.scala:249) at org.apache.spark.sql.catalyst.expressions.GenericRow.getString(rows.scala:192) at $anonfun$1.apply(<console>:48) Can easily reproduce with: val df0 = spark.sparkContext.parallelize(1 to 10).map { i => (i, i.toString) }.toDF("i", "iString") val a = df0.select($"i" as "i", lit(null) as "iString") /** * b.printSchema * root * |-- i: integer (nullable = false) * |-- iString: string (nullable = true) */ val b = a.union(df0) b.foreachPartition { p => if (p.hasNext) { val row = p.next() // throws java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.String println(row.getString(1)) } } Has anyone seen this issue? Shall I create a ticket? Thanks. Will