Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

via GitHub Tue, 25 Feb 2025 16:53:54 -0800


zhengruifeng commented on code in PR #50013:
URL: https://github.com/apache/spark/pull/50013#discussion_r1970742537



##########
sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLCache.scala:
##########
@@ -21,23 +21,52 @@ import java.util.concurrent.{ConcurrentMap, TimeUnit}
 
 import com.google.common.cache.CacheBuilder
 
-import org.apache.spark.internal.Logging
+import org.apache.spark.internal.{Logging, LogKeys, MDC}
+import org.apache.spark.ml.Model
 import org.apache.spark.ml.util.ConnectHelper
+import org.apache.spark.sql.classic.SparkSession
+import org.apache.spark.sql.connect.config.Connect._
+import org.apache.spark.util.SizeEstimator
 
 /**
  * MLCache is for caching ML objects, typically for models and summaries 
evaluated by a model.
  */
-private[connect] class MLCache extends Logging {
+private[connect] class MLCache(session: SparkSession) extends Logging {
   private val helper = new ConnectHelper()
   private val helperID = "______ML_CONNECT_HELPER______"
+  private def conf = session.sessionState.conf
 
-  private val cachedModel: ConcurrentMap[String, Object] = CacheBuilder
-    .newBuilder()
-    .softValues()
-    .maximumSize(MLCache.MAX_CACHED_ITEMS)
-    .expireAfterAccess(MLCache.CACHE_TIMEOUT_MINUTE, TimeUnit.MINUTES)
-    .build[String, Object]()
-    .asMap()
+  private val cachedModel: ConcurrentMap[String, (Object, Long)] = {
+    val builder = CacheBuilder.newBuilder().softValues()
+
+    val cacheWeight = conf.getConf(CONNECT_SESSION_ML_CACHE_TOTAL_ITEM_SIZE)
+    val cacheSize = conf.getConf(CONNECT_SESSION_ML_CACHE_SIZE)
+    val timeOut = conf.getConf(CONNECT_SESSION_ML_CACHE_TIMEOUT)
+
+    if (cacheWeight > 0) {
+      builder

Review Comment:
   it said
   
   ```
      *
      * <p>When eviction is necessary, the cache evicts entries that are less 
likely to be used again.
      * For example, the cache may evict an entry because it hasn't been used 
recently or very often.
      *
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

Reply via email to