zhengruifeng commented on code in PR #50013:
URL: https://github.com/apache/spark/pull/50013#discussion_r1970735576


##########
mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala:
##########
@@ -235,6 +236,13 @@ class FMClassifier @Since("3.0.0") (
     model.setSummary(Some(summary))
   }
 
+  override def estimateModelSize(dataset: Dataset[_]): Long = {
+    val numFeatures = DatasetUtils.getNumFeatures(dataset, $(featuresCol))

Review Comment:
   DatasetUtils.getNumFeatures is quite cheap, it will try to fetch 
`numFeatures` from the metadata, and if there is no such metadata, it just 
infer the `numFeatures` from the first row.
   
   
https://github.com/apache/spark/blob/9cf6dc873ff34412df6256cdc7613eed40716570/mllib/src/main/scala/org/apache/spark/ml/util/DatasetUtils.scala#L206-L214



##########
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:
##########
@@ -504,6 +506,10 @@ object Vectors {
 
   /** Max number of nonzero entries used in computing hash code. */
   private[linalg] val MAX_HASH_NNZ = 128
+
+  private[ml] def getSparseSize(nnz: Long): Long = nnz * 12 + 20

Review Comment:
   SG



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to