[
https://issues.apache.org/jira/browse/LUCENE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gergő Törcsvári updated LUCENE-5699:
------------------------------------
Attachment: 06-06-5699.patch
This patch is including all the mentioned features. It is contains some really
ugly modification because of the auto-formating in eclipse and auto organizing
imports.
It also contains the modifications for the online BayesClassifier.
The main changes:
Instead of max searching list making and Collections.sort.
Instead of calculating the docsWithClassSize once, calculate it in every search.
Because of the list possible to scale the score sum to 1. (line 180-201 in snbc)
The "online" function is not tested yet, the scaling seems to work.
> Lucene classification score calculation normalize and return lists
> ------------------------------------------------------------------
>
> Key: LUCENE-5699
> URL: https://issues.apache.org/jira/browse/LUCENE-5699
> Project: Lucene - Core
> Issue Type: Sub-task
> Components: modules/classification
> Reporter: Gergő Törcsvári
> Assignee: Tommaso Teofili
> Attachments: 06-06-5699.patch
>
>
> Now the classifiers can return only the "best matching" classes. If somebody
> want it to use more complex tasks he need to modify these classes for get
> second and third results too. If it is possible to return a list and it is
> not a lot resource why we dont do that? (We iterate a list so also.)
> The Bayes classifier get too small return values, and there were a bug with
> the zero floats. It was fixed with logarithmic. It would be nice to scale the
> class scores sum vlue to one, and then we coud compare two documents return
> score and relevance. (If we dont do this the wordcount in the test documents
> affected the result score.)
> With bulletpoints:
> * In the Bayes classification normalized score values, and return with result
> lists.
> * In the KNN classifier possibility to return a result list.
> * Make the ClassificationResult Comparable for list sorting.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]