Re: [PR] feat: `auto` scan mode should check for supported file location [datafusion-comet]

via GitHub Wed, 25 Jun 2025 16:16:50 -0700


parthchandra commented on code in PR #1930:
URL: https://github.com/apache/datafusion-comet/pull/1930#discussion_r2167792716



##########
spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala:
##########
@@ -258,11 +258,15 @@ case class CometScanRule(session: SparkSession) extends 
Rule[SparkPlan] {
   }
 
   private def selectScan(scanExec: FileSourceScanExec, partitionSchema: 
StructType): String = {
-    // TODO these checks are not yet exhaustive. For example, 
native_iceberg_compat does
-    //  not support reading from S3
 
     val fallbackReasons = new ListBuffer[String]()
 
+    // native_iceberg_compat only supports local filesystem and S3
+    if (!scanExec.relation.inputFiles
+        .forall(path => path.startsWith("file://") || 
path.startsWith("s3a://"))) {

Review Comment:
   This is the only way to get the file names afaik. 
   I don't think this adds too much overhead. Also, the file names have to come 
from either an InMemoryFileIndex (built by scanning a path) or from a table 
definition which must come from a catalog. So reading the catalog is 
unavoidable. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: `auto` scan mode should check for supported file location [datafusion-comet]

Reply via email to