Re: [PR] fix: correct schema type checking in native_iceberg_compat [datafusion-comet]

via GitHub Wed, 21 May 2025 10:07:19 -0700


parthchandra commented on code in PR #1755:
URL: https://github.com/apache/datafusion-comet/pull/1755#discussion_r2100776128



##########
spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala:
##########
@@ -1233,7 +1233,9 @@ abstract class ParquetReadSuite extends CometTestBase {
 
             withParquetDataFrame(data, schema = Some(readSchema)) { df =>
               // TODO: validate with Spark 3.x and 
'usingDataFusionParquetExec=true'
-              if (enableSchemaEvolution || usingDataSourceExec(conf)) {
+              if (enableSchemaEvolution || CometConf.COMET_NATIVE_SCAN_IMPL
+                  .get(conf)
+                  .equals(CometConf.SCAN_NATIVE_DATAFUSION)) {

Review Comment:
   Done



##########
spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala:
##########
@@ -1233,7 +1233,9 @@ abstract class ParquetReadSuite extends CometTestBase {
 
             withParquetDataFrame(data, schema = Some(readSchema)) { df =>
               // TODO: validate with Spark 3.x and 
'usingDataFusionParquetExec=true'
-              if (enableSchemaEvolution || usingDataSourceExec(conf)) {
+              if (enableSchemaEvolution || CometConf.COMET_NATIVE_SCAN_IMPL
+                  .get(conf)
+                  .equals(CometConf.SCAN_NATIVE_DATAFUSION)) {

Review Comment:
   the schemaEvolution flag applies only to CometScan (native_comet and 
native_iceberg_compat share that bit of code). CometNativeScan handles schema 
evolution differently and in this test always produces correct results instead 
of throwing an exception.
   Note that native_iceberg_compat can match the behaviour of native_datafusion 
since it is the same implementation underneath but we end up throwing an 
exception because the code is shared. 



##########
common/src/main/scala/org/apache/spark/sql/comet/parquet/CometParquetReadSupport.scala:
##########
@@ -369,7 +363,7 @@ object CometParquetReadSupport {
   /**
    * Whether the parquet schema contains any field IDs.
    */
-  private def containsFieldIds(schema: Type): Boolean = schema match {
+  def containsFieldIds(schema: Type): Boolean = schema match {

Review Comment:
   The corresponding function is not private in Spark either, but you're right, 
we do not need it to be public at the moment though we might need to for better 
field id support. Changed it back to private.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] fix: correct schema type checking in native_iceberg_compat [datafusion-comet]

Reply via email to