[
https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377513#comment-14377513
]
Michael McCandless commented on LUCENE-6339:
--------------------------------------------
New patch looks great, thanks [~areek]!
In TopSuggestDocsCollector:
- In collect, we seem to assume the suggest searcher will never call
collect more than num times? How is that? If so, can you add that to
the javadocs, and maybe add an assert upto < num in collect?
- Can we just allocate scoreDocs up front instead of lazily?
- In the javadocs, instead of "one hit can be..." maybe "one doc can
be..."? Hit is a tricky word in this context since it could be a doc
or a suggestion...
In SuggestIndexSearcher, does it really ever make sense to take a
generic Collector/LeafCollector? Can we instead just strongly type
the params to all the methods to be TopSuggestDocsCollector?
"In case a filter has to be applied, the queue size is doubled" is not
quite correct? Maybe change the logic there so the int queueSize is
first computed, and then if filter is enabled, it's doubled?
Can we remove the separate WeightProcessor class and just make
encode/decode static methods on NRTSuggester? We can add back
abstractions later if users somehow need control over weight
encoding...
Can we add a test that tests the extreme case of nearly all docs
filtered out and another test with nearly all docs deleted?
> [suggest] Near real time Document Suggester
> -------------------------------------------
>
> Key: LUCENE-6339
> URL: https://issues.apache.org/jira/browse/LUCENE-6339
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/search
> Affects Versions: 5.0
> Reporter: Areek Zillur
> Assignee: Areek Zillur
> Fix For: 5.0
>
> Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch
>
>
> The idea is to index documents with one or more *SuggestField*(s) and be able
> to suggest documents with a *SuggestField* value that matches a given key.
> A SuggestField can be assigned a numeric weight to be used to score the
> suggestion at query time.
> Document suggestion can be done on an indexed *SuggestField*. The document
> suggester can filter out deleted documents in near real-time. The suggester
> can filter out documents based on a Filter (note: may change to a non-scoring
> query?) at query time.
> A custom postings format (CompletionPostingsFormat) is used to index
> SuggestField(s) and perform document suggestions.
> h4. Usage
> {code:java}
> // hook up custom postings format
> // indexAnalyzer for SuggestField
> Analyzer analyzer = ...
> IndexWriterConfig config = new IndexWriterConfig(analyzer);
> Codec codec = new Lucene50Codec() {
> PostingsFormat completionPostingsFormat = new
> Completion50PostingsFormat();
> @Override
> public PostingsFormat getPostingsFormatForField(String field) {
> if (isSuggestField(field)) {
> return completionPostingsFormat;
> }
> return super.getPostingsFormatForField(field);
> }
> };
> config.setCodec(codec);
> IndexWriter writer = new IndexWriter(dir, config);
> // index some documents with suggestions
> Document doc = new Document();
> doc.add(new SuggestField("suggest_title", "title1", 2));
> doc.add(new SuggestField("suggest_name", "name1", 3));
> writer.addDocument(doc)
> ...
> // open an nrt reader for the directory
> DirectoryReader reader = DirectoryReader.open(writer, false);
> // SuggestIndexSearcher is a thin wrapper over IndexSearcher
> // queryAnalyzer will be used to analyze the query string
> SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader,
> queryAnalyzer);
>
> // suggest 10 documents for "titl" on "suggest_title" field
> TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10);
> {code}
> h4. Indexing
> Index analyzer set through *IndexWriterConfig*
> {code:java}
> SuggestField(String name, String value, long weight)
> {code}
> h4. Query
> Query analyzer set through *SuggestIndexSearcher*.
> Hits are collected in descending order of the suggestion's weight
> {code:java}
> // full options for TopSuggestDocs (TopDocs)
> TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter)
> // full options for Collector
> // note: only collects does not score
> void suggest(String field, CharSequence key, int maxNumPerLeaf, Filter
> filter, Collector collector)
> {code}
> h4. Analyzer
> *CompletionAnalyzer* can be used instead to wrap another analyzer to tune
> suggest field only parameters.
> {code:java}
> CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean
> preservePositionIncrements, int maxGraphExpansions)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]