[ https://issues.apache.org/jira/browse/SPARK-51892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Praneet Sharma updated SPARK-51892: ----------------------------------- Description: Hi, we have a JSON with 1 column of type: array[array[struct]]. In Spark 3.5.1, when we read this JSON using spark.read.json and pass the schema, it fails with classcastexception. The same code used to work in Spark 3.3.1 *Code to reproduce* (uses the attached b.json as input): {code:java} import org.apache.spark.sql.types._ val nestedStructSchema = StructType(Seq( StructField("c_union", IntegerType, true), StructField("c_boolean", BooleanType, true), StructField("c_double", DoubleType, true), StructField("c_int", IntegerType, true), StructField("c_long", IntegerType, true), StructField("c_string", StringType, true) )) val innerArraySchema = ArrayType(nestedStructSchema, true) val outerArraySchema = ArrayType(innerArraySchema, true) val finalSchema = StructType(Seq( StructField("array_array_struct", outerArraySchema, true) )) val df3 = spark.read.schema(finalSchema).json("/home/devbld/Desktop/b.json") df3.show {code} {*}Error{*}: {code:java} Caused by: java.lang.ClassCastException: class org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast to class org.apache.spark.sql.catalyst.util.ArrayData (org.apache.spark.sql.catalyst.expressions.GenericInternalRow and org.apache.spark.sql.catalyst.util.ArrayData are in unnamed module of loader 'app') at org.apache.spark.sql.catalyst.util.GenericArrayData.getArray(GenericArrayData.scala:77) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.$anonfun$apply$1(FileFormat.scala:156) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:197) {code} was: Hi, we have a JSON with 1 column of type: array[array[struct]]. In Spark 3.5.1, when we read this JSON using spark.read.json and pass the schema, it fails with classcastexception. The same code used to work in Spark 3.3.1 *Code to reproduce* (uses the attached b.json as input): {code:java} import org.apache.spark.sql.types._val nestedStructSchema = StructType(Seq( StructField("c_union", IntegerType, true), StructField("c_boolean", BooleanType, true), StructField("c_double", DoubleType, true), StructField("c_int", IntegerType, true), StructField("c_long", IntegerType, true), StructField("c_string", StringType, true) ))val innerArraySchema = ArrayType(nestedStructSchema, true) val outerArraySchema = ArrayType(innerArraySchema, true)val finalSchema = StructType(Seq( StructField("array_array_struct", outerArraySchema, true) ))val df3 = spark.read.schema(finalSchema).json("/home/devbld/Desktop/b.json") df3.show {code} {*}Error{*}: {code:java} Caused by: java.lang.ClassCastException: class org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast to class org.apache.spark.sql.catalyst.util.ArrayData (org.apache.spark.sql.catalyst.expressions.GenericInternalRow and org.apache.spark.sql.catalyst.util.ArrayData are in unnamed module of loader 'app') at org.apache.spark.sql.catalyst.util.GenericArrayData.getArray(GenericArrayData.scala:77) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.$anonfun$apply$1(FileFormat.scala:156) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:197) {code} > Reading JSON file with schema array[array[struct]] fails with > ClassCastException in Spark 3.5.1 > ----------------------------------------------------------------------------------------------- > > Key: SPARK-51892 > URL: https://issues.apache.org/jira/browse/SPARK-51892 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.5.1 > Environment: spark-shell of Spark 3.5.1 > Reporter: Praneet Sharma > Priority: Critical > > Hi, we have a JSON with 1 column of type: array[array[struct]]. In Spark > 3.5.1, when we read this JSON using spark.read.json and pass the schema, it > fails with classcastexception. The same code used to work in Spark 3.3.1 > *Code to reproduce* (uses the attached b.json as input): > {code:java} > import org.apache.spark.sql.types._ > val nestedStructSchema = StructType(Seq( > StructField("c_union", IntegerType, true), > StructField("c_boolean", BooleanType, true), > StructField("c_double", DoubleType, true), > StructField("c_int", IntegerType, true), > StructField("c_long", IntegerType, true), > StructField("c_string", StringType, true) > )) > val innerArraySchema = ArrayType(nestedStructSchema, true) > val outerArraySchema = ArrayType(innerArraySchema, true) > val finalSchema = StructType(Seq( > StructField("array_array_struct", outerArraySchema, true) > )) > val df3 = spark.read.schema(finalSchema).json("/home/devbld/Desktop/b.json") > df3.show > {code} > {*}Error{*}: > {code:java} > Caused by: java.lang.ClassCastException: class > org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast > to class org.apache.spark.sql.catalyst.util.ArrayData > (org.apache.spark.sql.catalyst.expressions.GenericInternalRow and > org.apache.spark.sql.catalyst.util.ArrayData are in unnamed module of loader > 'app') > at > org.apache.spark.sql.catalyst.util.GenericArrayData.getArray(GenericArrayData.scala:77) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.$anonfun$apply$1(FileFormat.scala:156) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:197) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org