Praneet Sharma created SPARK-51892:
--------------------------------------
Summary: Reading JSON file with schema array[array[struct]] fails
with ClassCastException in Spark 3.5.1
Key: SPARK-51892
URL: https://issues.apache.org/jira/browse/SPARK-51892
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.5.1
Environment: spark-shell of Spark 3.5.1
Reporter: Praneet Sharma
Hi, we have a JSON with 1 column of type: array[array[struct]]. In Spark 3.5.1,
when we read this JSON using spark.read.json and pass the schema, it fails with
classcastexception. The same code used to work in Spark 3.3.1
*Code to reproduce* (uses the attached b.json as input):
{code:java}
import org.apache.spark.sql.types._val nestedStructSchema = StructType(Seq(
StructField("c_union", IntegerType, true),
StructField("c_boolean", BooleanType, true),
StructField("c_double", DoubleType, true),
StructField("c_int", IntegerType, true),
StructField("c_long", IntegerType, true),
StructField("c_string", StringType, true)
))val innerArraySchema = ArrayType(nestedStructSchema, true)
val outerArraySchema = ArrayType(innerArraySchema, true)val finalSchema =
StructType(Seq(
StructField("array_array_struct", outerArraySchema, true)
))val df3 = spark.read.schema(finalSchema).json("/home/devbld/Desktop/b.json")
df3.show
{code}
{*}Error{*}:
{code:java}
Caused by: java.lang.ClassCastException: class
org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast to
class org.apache.spark.sql.catalyst.util.ArrayData
(org.apache.spark.sql.catalyst.expressions.GenericInternalRow and
org.apache.spark.sql.catalyst.util.ArrayData are in unnamed module of loader
'app')
at
org.apache.spark.sql.catalyst.util.GenericArrayData.getArray(GenericArrayData.scala:77)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source)
at
org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.$anonfun$apply$1(FileFormat.scala:156)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:197)
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]