Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

via GitHub Mon, 24 Feb 2025 18:37:10 -0800


hvanhovell commented on code in PR #50013:
URL: https://github.com/apache/spark/pull/50013#discussion_r1968733126



##########
mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala:
##########
@@ -235,6 +236,13 @@ class FMClassifier @Since("3.0.0") (
     model.setSummary(Some(summary))
   }
 
+  override def estimateModelSize(dataset: Dataset[_]): Long = {
+    val numFeatures = DatasetUtils.getNumFeatures(dataset, $(featuresCol))

Review Comment:
   I am worried that executing the input query multiple times will be 
considered wasteful. I am wondering if we should do the check while fitting 
instead; we fail as soon as the model gets too large.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

Reply via email to