Kontinuation commented on issue #1844: URL: https://github.com/apache/datafusion-comet/issues/1844#issuecomment-2940507433
> Possibly related to [#1823](https://github.com/apache/datafusion-comet/issues/1823) (i.e., not necessarily delta specific)? Probably, the stack trace and error message looks very similar. In our case, the exception was thrown when reading delta logs, which are a bunch of JSON files. Delta uses custom schema including all types of delta transactions, and the resulting DataFrame contains rows that have null struct fields. Shuffle writing these rows containing null struct fields throws the exception. We have constructed a minimal repo that does not require Delta: ```scala val testData = "{}\n" val path = Paths.get(dir.toString, "test.json") Files.write(path, testData.getBytes) // Define the nested struct schema val readSchema = StructType( Array( StructField( "metaData", StructType( Array(StructField( "format", StructType(Array(StructField("provider", StringType, nullable = true))), nullable = true))), nullable = true))) // Read JSON with custom schema and repartition. The repartitioned data contains null structs val df = spark.read.format("json").schema(readSchema).load(path.toString).repartition(2) df.show() // <-- throws exception ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org