Working on spark.ml.classification.LogisticRegression.scala (spark 1.5),

It might be useful if we can create a summary for any given dataset, not
just training set.
Actually, BinaryLogisticRegressionTrainingSummary  is only created when
model is computed based on training set.
As usual, we need to summary test set to know about the model performance.
However, we can not create our own BinaryLogisticRegressionSummary for
other date set (of type DataFrame), because the Summary class is "private"
in classification package.

Would it be better to remove the "private" access modifier and allow the
following code on user side:

val lr = new LogisticRegression()

val model = lr.fit(trainingSet)

val binarySummary =
  new BinaryLogisticRegressionSummary(
    model.transform(testSet),
    lr.probabilityCol,
    lr.labelCol
  )

binarySummary.roc


Thus, we can use the model to summary any data set we want.

If there is a way to summary test set, please let me know. I have browsed
LogisticRegression.scala, but failed to find one.

Thx.

-- 
Hao Ren

Data Engineer @ leboncoin

Paris, France

Reply via email to