andygrove commented on code in PR #1930: URL: https://github.com/apache/datafusion-comet/pull/1930#discussion_r2167730097
########## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ########## @@ -258,11 +258,15 @@ case class CometScanRule(session: SparkSession) extends Rule[SparkPlan] { } private def selectScan(scanExec: FileSourceScanExec, partitionSchema: StructType): String = { - // TODO these checks are not yet exhaustive. For example, native_iceberg_compat does - // not support reading from S3 val fallbackReasons = new ListBuffer[String]() + // native_iceberg_compat only supports local filesystem and S3 + if (!scanExec.relation.inputFiles + .forall(path => path.startsWith("file://") || path.startsWith("s3a://"))) { Review Comment: @parthchandra I may need some guidance here. Just looking at the input files causes some stats to be updated (such as `HiveCatalogMetrics.METRIC_FILES_DISCOVERED` and `HiveCatalogMetrics.METRIC_FILE_CACHE_HITS`), leading to some test failures. I wonder if this could be adding some overhead. Do you know if there is a more efficient way for us to check for supported file locations? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org