huaxingao commented on code in PR #50246: URL: https://github.com/apache/spark/pull/50246#discussion_r2021813936
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteRowLevelCommand.scala: ########## @@ -273,9 +273,8 @@ trait RewriteRowLevelCommand extends Rule[LogicalPlan] { outputs: Seq[Seq[Expression]], colOrdinals: Seq[Int], attrs: Seq[Attribute]): ProjectingInternalRow = { - val schema = StructType(attrs.zipWithIndex.map { case (attr, index) => - val nullable = outputs.exists(output => output(colOrdinals(index)).nullable) - StructField(attr.name, attr.dataType, nullable, attr.metadata) + val schema = StructType(attrs.zipWithIndex.map { case (attr, _) => + StructField(attr.name, attr.dataType, attr.nullable, attr.metadata) Review Comment: Thanks @amogh-jahagirdar for your comment! I took a closer look at why the test passed in Spark 3.4 extension, but failed with Spark 4.0. In Spark 3.4 extension, when building the `metadataProjection`, we are using [updateAndDeleteOutputs](https://github.com/apache/iceberg/blob/main/spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteRowLevelIcebergCommand.scala#L94), which does not contain the [INSERT_OPERATION](https://github.com/apache/iceberg/blob/main/spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteRowLevelIcebergCommand.scala#L83C70-L83C86) <img width="499" alt="Screenshot 2025-03-31 at 2 06 21 PM" src="https://github.com/user-attachments/assets/3d14922d-6b22-4ea0-ac6a-16c0cb22e353" /> in which _spec_id has nullable false, and _partition has nullable true. In Spark4.0, when building `metadataProjection`, we are using [outputsWithMetadata](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteRowLevelCommand.scala#L196), which contains [REINSERT_OPERATION](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteRowLevelCommand.scala#L44), so the outputs contains two rows <img width="540" alt="Screenshot 2025-03-31 at 2 32 31 PM" src="https://github.com/user-attachments/assets/bf7e2a3d-6710-4a3d-bc56-33b15453d7c0" /> Since the second row has null for both _spec_id and _partition, the calculated nullable for both the metadata columns are true, which led the schema verification for [MetadataSchema](https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWriteBuilder.java#L109) failed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org