Working on spark.ml.classification.LogisticRegression.scala (spark 1.5), It might be useful if we can create a summary for any given dataset, not just training set. Actually, BinaryLogisticRegressionTrainingSummary is only created when model is computed based on training set. As usual, we need to summary test set to know about the model performance. However, we can not create our own BinaryLogisticRegressionSummary for other date set (of type DataFrame), because the Summary class is "private" in classification package.
Would it be better to remove the "private" access modifier and allow the following code on user side: val lr = new LogisticRegression() val model = lr.fit(trainingSet) val binarySummary = new BinaryLogisticRegressionSummary( model.transform(testSet), lr.probabilityCol, lr.labelCol ) binarySummary.roc Thus, we can use the model to summary any data set we want. If there is a way to summary test set, please let me know. I have browsed LogisticRegression.scala, but failed to find one. Thx. -- Hao Ren Data Engineer @ leboncoin Paris, France