[
https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Remi Melisson updated LUCENE-5354:
----------------------------------
Attachment: LUCENE-5354_3.patch
Hi!
Here is new patch including your comment for the coefficient calculation (I
guess a Lambda function would be perfect here!).
I ran the performance test on my laptop, here is the results compared to the
AnalyzingInfixSuggester :
-- construction time
AnalyzingInfixSuggester input: 50001, time[ms]: 1780 [+- 367.58]
BlendedInfixSuggester input: 50001, time[ms]: 6507 [+- 2106.52]
-- prefixes: 2-4, num: 7, onlyMorePopular: false
AnalyzingInfixSuggester queries: 50001, time[ms]: 6804 [+- 1403.13], ~kQPS: 7
BlendedInfixSuggester queries: 50001, time[ms]: 26503 [+- 2624.41], ~kQPS: 2
-- prefixes: 6-9, num: 7, onlyMorePopular: false
AnalyzingInfixSuggester queries: 50001, time[ms]: 3995 [+- 551.20], ~kQPS: 13
BlendedInfixSuggester queries: 50001, time[ms]: 5355 [+- 1295.41], ~kQPS: 9
-- prefixes: 100-200, num: 7, onlyMorePopular: false
AnalyzingInfixSuggester queries: 50001, time[ms]: 2626 [+- 588.43], ~kQPS: 19
BlendedInfixSuggester queries: 50001, time[ms]: 1980 [+- 574.16], ~kQPS: 25
-- RAM consumption
AnalyzingInfixSuggester size[B]: 1,430,920
BlendedInfixSuggester size[B]: 1,630,488
If you have any idea on how we could improve the performance, let me know (see
above my comment for your previous suggestion to avoid visiting term vectors).
> Blended score in AnalyzingInfixSuggester
> ----------------------------------------
>
> Key: LUCENE-5354
> URL: https://issues.apache.org/jira/browse/LUCENE-5354
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/spellchecker
> Affects Versions: 4.4
> Reporter: Remi Melisson
> Priority: Minor
> Labels: suggester
> Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch,
> LUCENE-5354_3.patch
>
>
> I'm working on a custom suggester derived from the AnalyzingInfix. I require
> what is called a "blended score" (//TODO ln.399 in AnalyzingInfixSuggester)
> to transform the suggestion weights depending on the position of the searched
> term(s) in the text.
> Right now, I'm using an easy solution :
> If I want 10 suggestions, then I search against the current ordered index for
> the 100 first results and transform the weight :
> bq. a) by using the term position in the text (found with TermVector and
> DocsAndPositionsEnum)
> or
> bq. b) by multiplying the weight by the score of a SpanQuery that I add when
> searching
> and return the updated 10 most weighted suggestions.
> Since we usually don't need to suggest so many things, the bigger search +
> rescoring overhead is not so significant but I agree that this is not the
> most elegant solution.
> We could include this factor (here the position of the term) directly into
> the index.
> So, I can contribute to this if you think it's worth adding it.
> Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a
> dedicated class ?
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]