[jira] [Updated] (LUCENE-5354) Blended score in AnalyzingInfixSuggester

Remi Melisson (JIRA) Thu, 09 Jan 2014 08:37:53 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Remi Melisson updated LUCENE-5354:
----------------------------------

    Attachment: LUCENE-5354_3.patch

Hi!
Here is new patch including your comment for the coefficient calculation (I 
guess a Lambda function would be perfect here!).

I ran the performance test on my laptop, here is the results compared to the 
AnalyzingInfixSuggester : 
-- construction time
AnalyzingInfixSuggester input: 50001, time[ms]: 1780 [+- 367.58]
BlendedInfixSuggester input: 50001, time[ms]: 6507 [+- 2106.52]
-- prefixes: 2-4, num: 7, onlyMorePopular: false
AnalyzingInfixSuggester queries: 50001, time[ms]: 6804 [+- 1403.13], ~kQPS: 7
BlendedInfixSuggester queries: 50001, time[ms]: 26503 [+- 2624.41], ~kQPS: 2
-- prefixes: 6-9, num: 7, onlyMorePopular: false
AnalyzingInfixSuggester queries: 50001, time[ms]: 3995 [+- 551.20], ~kQPS: 13
BlendedInfixSuggester queries: 50001, time[ms]: 5355 [+- 1295.41], ~kQPS: 9
-- prefixes: 100-200, num: 7, onlyMorePopular: false
AnalyzingInfixSuggester queries: 50001, time[ms]: 2626 [+- 588.43], ~kQPS: 19
BlendedInfixSuggester queries: 50001, time[ms]: 1980 [+- 574.16], ~kQPS: 25
-- RAM consumption
AnalyzingInfixSuggester size[B]:    1,430,920
BlendedInfixSuggester size[B]:    1,630,488

If you have any idea on how we could improve the performance, let me know (see 
above my comment for your previous suggestion to avoid visiting term vectors).

> Blended score in AnalyzingInfixSuggester
> ----------------------------------------
>
>                 Key: LUCENE-5354
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5354
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spellchecker
>    Affects Versions: 4.4
>            Reporter: Remi Melisson
>            Priority: Minor
>              Labels: suggester
>         Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch, 
> LUCENE-5354_3.patch
>
>
> I'm working on a custom suggester derived from the AnalyzingInfix. I require 
> what is called a "blended score" (//TODO ln.399 in AnalyzingInfixSuggester) 
> to transform the suggestion weights depending on the position of the searched 
> term(s) in the text.
> Right now, I'm using an easy solution :
> If I want 10 suggestions, then I search against the current ordered index for 
> the 100 first results and transform the weight :
> bq. a) by using the term position in the text (found with TermVector and 
> DocsAndPositionsEnum)
> or
> bq. b) by multiplying the weight by the score of a SpanQuery that I add when 
> searching
> and return the updated 10 most weighted suggestions.
> Since we usually don't need to suggest so many things, the bigger search + 
> rescoring overhead is not so significant but I agree that this is not the 
> most elegant solution.
> We could include this factor (here the position of the term) directly into 
> the index.
> So, I can contribute to this if you think it's worth adding it.
> Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a 
> dedicated class ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-5354) Blended score in AnalyzingInfixSuggester

Reply via email to