We have kept that private because we need to decide on a name for the method which evaluates on a test set (see the TODO comment <https://github.com/apache/spark/pull/7099/files#diff-668c79317c51f40df870d3404d8a731fR272>); perhaps you could push for this to happen by creating a Jira and pinging jkbradley and mengxr. Thanks!
On Thu, Sep 17, 2015 at 8:07 AM, Hao Ren <inv...@gmail.com> wrote: > Working on spark.ml.classification.LogisticRegression.scala (spark 1.5), > > It might be useful if we can create a summary for any given dataset, not > just training set. > Actually, BinaryLogisticRegressionTrainingSummary is only created when > model is computed based on training set. > As usual, we need to summary test set to know about the model performance. > However, we can not create our own BinaryLogisticRegressionSummary for > other date set (of type DataFrame), because the Summary class is "private" > in classification package. > > Would it be better to remove the "private" access modifier and allow the > following code on user side: > > val lr = new LogisticRegression() > > val model = lr.fit(trainingSet) > > val binarySummary = > new BinaryLogisticRegressionSummary( > model.transform(testSet), > lr.probabilityCol, > lr.labelCol > ) > > binarySummary.roc > > > Thus, we can use the model to summary any data set we want. > > If there is a way to summary test set, please let me know. I have browsed > LogisticRegression.scala, but failed to find one. > > Thx. > > -- > Hao Ren > > Data Engineer @ leboncoin > > Paris, France >