[ 
https://issues.apache.org/jira/browse/LUCENE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-5803.
-----------------------------------

    Resolution: Fixed

Thank you also to Shay Banon, who provided the original idea and patch inside 
the ES code tree.

> Add another AnalyzerWrapper class that does not have its own cache, so 
> delegate-only wrappers don't create thread local resources several times
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-5803
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5803
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 4.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 5.0, 4.10
>
>         Attachments: LUCENE-5803.patch, LUCENE-5803.patch, LUCENE-5803.patch, 
> LUCENE-5803.patch, LUCENE-5803.patch, LUCENE-5803.patch
>
>
> This is a followup issue for the following Elasticsearch issue: 
> https://github.com/elasticsearch/elasticsearch/pull/6714
> Basically the problem is the following:
> - Elasticsearch has a pool of Analyzers that are used for analysis in several 
> indexes
> - Each index uses a different PerFieldAnalyzerWrapper
> PerFieldAnalyzerWrapper uses PER_FIELD_REUSE_STRATEGY. Because of this it 
> caches the tokenstreams for every field. If there are many fields, this are a 
> lot. In addition, the underlying analyzers may also cache tokenstreams and 
> other PerFieldAnalyzerWrappers do the same, although the delegate Analyzer 
> can always return the same components.
> We should add similar code to Elasticsearch's directly to Lucene: If the 
> delegating Analyzer just delegates per Field or just wraps CharFilters around 
> the Reader, there is no need to cache the TokenStreamComponents a second time 
> in the delegating Analyzers. This is only needed, if the delegating Analyzers 
> adds additional TokenFilters (like ShingleAnalyzerWrapper).
> We should name this new class DelegatingAnalyzerWrapper extends 
> AnalyzerWrapper. The wrapComponents method must be final, because we are not 
> allowed to add additional TokenFilters, but unlike ES, we don't need to 
> disallow wrapping with CharFilters.
> Internally this class uses a private ReuseStrategy that just delegates to the 
> underlying analyzer. It does not matter here if the strategy of the delegate 
> is global or per field, this is private to the delegate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to