[
https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350351#comment-14350351
]
Simon Willnauer commented on LUCENE-6339:
-----------------------------------------
Hey Areek, I agree with mike this looks awesome... lemme give you some comments
* can we make {{CompletionAnalyzer}} immutable by any chance? I'd really like
to not have setters if possible? For that I guess it's constants need to be
public as well?
* is {{private boolean isReservedInputCharacter(char c) }} needed since we
then afterwards check it again in the {{checkKey}} method, maybe you just wanna
use a switch here?
* In {{CompletionFieldsConsumer#close()}} I think we need to make sure
{{IOUtils.close(dictOut);}} is also called if an exception is hit?
* do we need the extra {{InputStreamDataInput}} in
{{CompletionTermWriter#parse}}, I mean it's a byte input stream so we should be
able to read all of the bytes?
* {{SuggestPayload}} doesn't need a default ctor
* can we use {{ if (success == false) }} instead of {{ if (!success) }} as a
pattern in general?
* use try / finally in {{CompletionFieldsProducer#close()}} to ensure all
resource are closed or pass both the dict and {{ delegateFieldsProducer }} to
IOUtils#close()?
* you fetch the checksum for the dict file in {{ CompletionFieldsProducer#ctor
}} via {{ CodecUtil.retrieveChecksum(dictIn); } but you ignore it's return
value, was this intended? I think you don't wanna do that here? Did you intend
to check the entire file?
* I wonder if we should just write one file for both, the index and the FSTs?
What's the benefit from having two?
For loading the dict you put a comment in there sayingm {{ // is there a better
way of doing this?}}
I think what you need to do is this:
{code}
public synchronized SegmentLookup lookup() throws IOException {
if (lookup == null) {
try (IndexInput dictClone = dictIn.clone()) { // let multiple fields load
concurrently
dictClone.seek(offset); // this is your field private clone
lookup = NRTSuggester.load(dictClone);
}
}
return lookup;
}
{code}
I'd appreciate a tests that this works just fine ie. loading multiple FSTs
concurrently.
I didn't get further than this due to the lack of time but I will come back to
this either today or tomorrow. Good stuff Areek
> [suggest] Near real time Document Suggester
> -------------------------------------------
>
> Key: LUCENE-6339
> URL: https://issues.apache.org/jira/browse/LUCENE-6339
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/search
> Affects Versions: 5.0
> Reporter: Areek Zillur
> Assignee: Areek Zillur
> Fix For: 5.0
>
> Attachments: LUCENE-6339.patch
>
>
> The idea is to index documents with one or more *SuggestField*(s) and be able
> to suggest documents with a *SuggestField* value that matches a given key.
> A SuggestField can be assigned a numeric weight to be used to score the
> suggestion at query time.
> Document suggestion can be done on an indexed *SuggestField*. The document
> suggester can filter out deleted documents in near real-time. The suggester
> can filter out documents based on a Filter (note: may change to a non-scoring
> query?) at query time.
> A custom postings format (CompletionPostingsFormat) is used to index
> SuggestField(s) and perform document suggestions.
> h4. Usage
> {code:java}
> // hook up custom postings format
> // indexAnalyzer for SuggestField
> Analyzer analyzer = ...
> IndexWriterConfig config = new IndexWriterConfig(analyzer);
> Codec codec = new Lucene50Codec() {
> @Override
> public PostingsFormat getPostingsFormatForField(String field) {
> if (isSuggestField(field)) {
> return new
> CompletionPostingsFormat(super.getPostingsFormatForField(field));
> }
> return super.getPostingsFormatForField(field);
> }
> };
> config.setCodec(codec);
> IndexWriter writer = new IndexWriter(dir, config);
> // index some documents with suggestions
> Document doc = new Document();
> doc.add(new SuggestField("suggest_title", "title1", 2));
> doc.add(new SuggestField("suggest_name", "name1", 3));
> writer.addDocument(doc)
> ...
> // open an nrt reader for the directory
> DirectoryReader reader = DirectoryReader.open(writer, false);
> // SuggestIndexSearcher is a thin wrapper over IndexSearcher
> // queryAnalyzer will be used to analyze the query string
> SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader,
> queryAnalyzer);
>
> // suggest 10 documents for "titl" on "suggest_title" field
> TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10);
> {code}
> h4. Indexing
> Index analyzer set through *IndexWriterConfig*
> {code:java}
> SuggestField(String name, String value, long weight)
> {code}
> h4. Query
> Query analyzer set through *SuggestIndexSearcher*.
> Hits are collected in descending order of the suggestion's weight
> {code:java}
> // full options for TopSuggestDocs (TopDocs)
> TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter)
> // full options for Collector
> // note: only collects does not score
> void suggest(String field, CharSequence key, int maxNumPerLeaf, Filter
> filter, Collector collector)
> {code}
> h4. Analyzer
> *CompletionAnalyzer* can be used instead to wrap another analyzer to tune
> suggest field only parameters.
> {code:java}
> CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer);
> completionAnalyzer.setPreserveSep(..)
> completionAnalyzer.setPreservePositionsIncrements(..)
> completionAnalyzer.setMaxGraphExpansions(..)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]