[jira] [Created] (LUCENE-6445) Highlighter TokenSources simplification; just one getAnyTokenStream()

David Smiley (JIRA) Mon, 20 Apr 2015 11:39:18 -0700

David Smiley created LUCENE-6445:
------------------------------------

             Summary: Highlighter TokenSources simplification; just one 
getAnyTokenStream()
                 Key: LUCENE-6445
                 URL: https://issues.apache.org/jira/browse/LUCENE-6445
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/highlighter
            Reporter: David Smiley
            Assignee: David Smiley



The Highlighter "TokenSources" class has quite a few utility methods pertaining 
to getting a TokenStream from either term vectors or analyzed text.  I think 
it's too much:
* some go to term vectors, some don't.  But if you don't want to go to term 
vectors, then it's quite easy for the caller to invoke the Analyzer for the 
field value, and to get that field value.
* Some methods return null, some never null; I forget which at a glance.
* Some methods read the Document (to get a field value) from the IndexReader, 
some don't.  Furthermore, it's not an ideal place to get the doc since your app 
might be using an IndexSearcher with a document cache (e.g. SolrIndexSearcher).
* None of the methods accept a Fields instance from term vectors as a 
parameter.  Based on how Lucene's term vector format works, this is a 
performance trap if you don't re-use an instance across fields on the document 
that you're highlighting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (LUCENE-6445) Highlighter TokenSources simplification; just one getAnyTokenStream()

Reply via email to