[jira] [Commented] (LUCENE-5699) Lucene classification score calculation normalize and return lists

Michael McCandless (JIRA) Thu, 21 Aug 2014 16:38:45 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106153#comment-14106153
 ]


Michael McCandless commented on LUCENE-5699:
--------------------------------------------

This commit caused "ant precommit" failures on trunk:

{noformat}
     [exec] 
build/docs/classification/org/apache/lucene/classification/SimpleNaiveBayesClassifier.html
     [exec]   missing Fields: analyzer
     [exec]   missing Fields: atomicReader
     [exec]   missing Fields: classFieldName
     [exec]   missing Fields: indexSearcher
     [exec]   missing Fields: query
     [exec]   missing Fields: textFieldNames
     [exec]   missing Methods: countDocsWithClass()
     [exec]   missing Methods: tokenizeDoc(java.lang.String)
     [exec]
     [exec] Missing javadocs were found!
{noformat}

Also, was it intentional that this wasn't backported to 4.x?

> Lucene classification score calculation normalize and return lists
> ------------------------------------------------------------------
>
>                 Key: LUCENE-5699
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5699
>             Project: Lucene - Core
>          Issue Type: Sub-task
>          Components: modules/classification
>            Reporter: Gergő Törcsvári
>            Assignee: Tommaso Teofili
>              Labels: gsoc2014
>             Fix For: 5.0
>
>         Attachments: 06-06-5699.patch, 0730.patch, 0803-base.patch, 
> 0810-base.patch
>
>
> Now the classifiers can return only the "best matching" classes. If somebody 
> want it to use more complex tasks he need to modify these classes for get 
> second and third results too. If it is possible to return a list and it is 
> not a lot resource why we dont do that? (We iterate a list so also.)
> The Bayes classifier get too small return values, and there were a bug with 
> the zero floats. It was fixed with logarithmic. It would be nice to scale the 
> class scores sum vlue to one, and then we coud compare two documents return 
> score and relevance. (If we dont do this the wordcount in the test documents 
> affected the result score.)
> With bulletpoints:
> * In the Bayes classification normalized score values, and return with result 
> lists.
> * In the KNN classifier possibility to return a result list.
> * Make the ClassificationResult Comparable for list sorting.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5699) Lucene classification score calculation normalize and return lists

Reply via email to