Re: [MLlib] BinaryLogisticRegressionSummary on test set

Feynman Liang Fri, 18 Sep 2015 10:05:33 -0700

If you have the time, submitting a PR for it would be awesome! However, our
review bandwidth is limited so you should not expect it to get immediately
reviewed. Let's continue discussion of the name on JIRA


On Fri, Sep 18, 2015 at 2:47 AM, Hao Ren <inv...@gmail.com> wrote:

> Thank you for the reply.
>
> I have created a jira issue and pinged mengxr.
>
> Here is the link: https://issues.apache.org/jira/browse/SPARK-10691
>
> I did not find jkbradley on jira. I saw he is on github.
>
> BTW, should I create a pull request on removing the private modifier for
> further discussion ?
>
> Thx.
>
> On Thu, Sep 17, 2015 at 6:44 PM, Feynman Liang <fli...@databricks.com>
> wrote:
>
>> We have kept that private because we need to decide on a name for the
>> method which evaluates on a test set (see the TODO comment
>> <https://github.com/apache/spark/pull/7099/files#diff-668c79317c51f40df870d3404d8a731fR272>);
>> perhaps you could push for this to happen by creating a Jira and pinging
>> jkbradley and mengxr. Thanks!
>>
>> On Thu, Sep 17, 2015 at 8:07 AM, Hao Ren <inv...@gmail.com> wrote:
>>
>>> Working on spark.ml.classification.LogisticRegression.scala (spark 1.5),
>>>
>>> It might be useful if we can create a summary for any given dataset, not
>>> just training set.
>>> Actually, BinaryLogisticRegressionTrainingSummary  is only created when
>>> model is computed based on training set.
>>> As usual, we need to summary test set to know about the model
>>> performance.
>>> However, we can not create our own BinaryLogisticRegressionSummary for
>>> other date set (of type DataFrame), because the Summary class is "private"
>>> in classification package.
>>>
>>> Would it be better to remove the "private" access modifier and allow the
>>> following code on user side:
>>>
>>> val lr = new LogisticRegression()
>>>
>>> val model = lr.fit(trainingSet)
>>>
>>> val binarySummary =
>>>   new BinaryLogisticRegressionSummary(
>>>     model.transform(testSet),
>>>     lr.probabilityCol,
>>>     lr.labelCol
>>>   )
>>>
>>> binarySummary.roc
>>>
>>>
>>> Thus, we can use the model to summary any data set we want.
>>>
>>> If there is a way to summary test set, please let me know. I have
>>> browsed LogisticRegression.scala, but failed to find one.
>>>
>>> Thx.
>>>
>>> --
>>> Hao Ren
>>>
>>> Data Engineer @ leboncoin
>>>
>>> Paris, France
>>>
>>
>>
>
>
> --
> Hao Ren
>
> Data Engineer @ leboncoin
>
> Paris, France
>

Re: [MLlib] BinaryLogisticRegressionSummary on test set

Reply via email to