parthchandra commented on code in PR #2251:
URL: https://github.com/apache/datafusion-comet/pull/2251#discussion_r2310530181
##########
spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala:
##########
@@ -372,3 +388,64 @@ case class CometScanTypeChecker(scanImpl: String) extends
DataTypeSupport with C
}
}
}
+
+object CometScanRule extends Logging {
+
+ /**
+ * Validating object store configs can cause requests to be made to S3 APIs
(such as when
+ * resolving the region for a bucket). We use a cache to reduce the number
of S3 calls.
+ *
+ * The key is the config map converted to a string. The value is the reason
that the config is
+ * not valid, or None if the config is valid.
+ */
+ val configValidityMap = new mutable.HashMap[String, Option[String]]()
+
+ /**
+ * We do not expect to see a large number of unique configs within the
lifetime of a Spark
+ * session, but we reset the cache once it reaches a fixed size to prevent
it growing
+ * indefinitely.
+ */
+ val configValidityMapMaxSize = 1024
+
+ def validateObjectStoreConfig(
+ filePath: String,
+ hadoopConf: Configuration,
+ fallbackReasons: mutable.ListBuffer[String]): Unit = {
+ val objectStoreConfigMap =
+ NativeConfig.extractObjectStoreOptions(hadoopConf, URI.create(filePath))
+
+ val cacheKey = objectStoreConfigMap
+ .map { case (k, v) =>
+ s"$k=$v"
+ }
+ .toList
+ .sorted
+ .mkString("\n")
+
+ if (configValidityMap.size >= configValidityMapMaxSize) {
Review Comment:
I don't know if this is sufficient. The user may have changed the
credentials provider class for instance.
However, it is unlikely that this would happen in the middle of a session so
I'm fine if you don't want to change this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]