Re: Seemingly very difficult to wrap an Analyzer with CharFilter

Steven Schlansker Wed, 12 Jun 2013 16:03:05 -0700

On Jun 12, 2013, at 3:44 PM, Michael Sokolov <msoko...@safaribooksonline.com> 
wrote:


> You may not have noticed that CharFilter extends Reader.  The expected 
> pattern here is that you chain instances together -- your CharFilter should 
> act as *input* to the Analyzer, I think.  Don't think in terms of extending 
> these analysis classes (except the base ones designed for it): compose them 
> so that each consumes the one before it
> 

Hi Mike,

Hm, that may work out.  I am a little surprised because I thought the intention 
is that you set the Analyzer up as part of the configuration, and when you add 
documents, the analyzer takes care of all text processing.  In particular this 
means that now I have to ensure that the same transformation is done at query 
time, and I thought the analyzer abstraction was supposed to avoid this.

But if this is how it should be done, it could work.  Thanks for the pointer.

Steven


> On 6/11/2013 7:52 PM, Steven Schlansker wrote:
>> Hi everyone,
>> 
>> I am trying to add a CharFilter to my Analyzer.  I started with a 
>> StandardAnalyzer wrapped with an ASCIIFoldingFilter.  Then I realized that 
>> it does not handle searches for names that include punctuation well, for 
>> example I want a PrefixQuery "pf" to match "P.F. Chang's" or "zaras" to 
>> match "Zara's".
>> 
>> It seems that the easiest plan of attack here is to filter out all 
>> punctuation before analysis.  Per the Analyzer package documentation, that 
>> means I should use a CharFilter.
>> 
>> However, it seems next to impossible to actually insert a CharFilter into 
>> the analyzer!
>> 
>> The JavaDoc for Analyzer.initReader says "Override this if you want to 
>> insert a CharFilter".
>> 
>> If my code extends Analyzer, I can extend initReader but I cannot delegate 
>> createComponents to my base StandardAnalyzer, as it is protected.  I cannot 
>> delegate tokenStream to my base analyzer, because it is final.  So a 
>> subclass of Analyzer seemingly cannot use another Analyzer to do its dirty 
>> work.
>> 
>> There is an AnalyzerWrapper class that seems perfect for what I want!  I can 
>> provide a base analyzer and only override the pieces that I want.  Except … 
>> initReader is overridden already to delegate to the base analyzer, and this 
>> override is "final"!  Bummer!
>> 
>> I guess I could have my Analyzer be in the org.apache.lucene.analyzers 
>> package and then I can access the protected createComponents method, but 
>> this seems like a disgustingly hacky way to bypass the public API that I 
>> really should use.
>> 
>> Am I missing something glaring here?  How can I amend a StandardAnalyzer to 
>> use a custom CharFilter?
>> 
>> Thanks for any guidance,
>> Steven
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Seemingly very difficult to wrap an Analyzer with CharFilter

Reply via email to