RE: Lucene 4.0 PerFieldAnalyzerWrapper question

Mike O'Leary Tue, 25 Sep 2012 17:26:56 -0700

Hi Chris,
In a nutshell, my question is, what should I put in place of ??? to make this 
into a Lucene 4.0 analyzer?


public class MyPerFieldAnalyzer extends Analyzer {
  PerFieldAnalyzerWrapper _analyzer;

  public MyPerFieldAnalyzer() {
    Map<String, Analyzer> analyzerMap = new HashMap<String,  Analyzer>();

    analyzerMap.put("IDNumber", new KeywordAnalyzer());
    ...
    ...

    _analyzer = new PerFieldAnalyzerWrapper(new CustomAnalyzer(),  analyzerMap);
  }

  @Override
  public TokenStreamComponents createComponents(String fieldname, Reader 
reader) {
    Tokenizer source = ???;
    TokenStream stream = _analyzer.tokenStream(fieldname, reader);
    return new TokenStreamComponents(source, stream);
  }
}

I must be missing something obvious. Can you tell me what it is?
Thanks,
Mike

-----Original Message-----
From: Chris Male [mailto:gento...@gmail.com] 
Sent: Tuesday, September 25, 2012 5:18 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question

Hi Mike,

I don't really understand what problem you're having.

PerFieldAnalyzerWrapper, like all AnalyzerWrappers, uses 
Analyzer.PerFieldReuseStrategy which means it caches the TokenStreamComponents 
per field.  The TokenStreamComponents cached are created by by retrieving the 
wrapped Analyzer through
AnalyzerWrapper.getWrappedAnalyzer(Field) and calling createComponents.  In 
PerFieldAnalyzerWrapper, getWrappedAnalyzer pulls the Analyzer from the Map you 
provide.

Consequently to use your custom Analyzers and KeywordAnalyzer, all you need to 
do is define your custom Analyzer using the new Analyzer API (that is using 
TokenStreamComponents), create your Map from that Analyzer and KeywordAnalyzer 
and pass it into PerFieldAnalyzerWrapper.  This seems to be what you're doing 
in your code sample.

Are you able to expand on the problem you're encountering?

On Wed, Sep 26, 2012 at 11:57 AM, Mike O'Leary <tmole...@uw.edu> wrote:

> I am updating an analyzer that uses a particular configuration of the 
> PerFieldAnalyzerWrapper to work with Lucene 4.0. A few of the fields 
> use a custom analyzer and StandardTokenizer and the other fields use 
> the KeywordAnalyzer and KeywordTokenizer. The older version of the 
> analyzer looks like this:
>
> public class MyPerFieldAnalyzer extends Analyzer {
>   PerFieldAnalyzerWrapper _analyzer;
>
>   public MyPerFieldAnalyzer() {
>     Map<String, Analyzer> analyzerMap = new HashMap<String, 
> Analyzer>();
>
>     analyzerMap.put("IDNumber", new KeywordAnalyzer());
>     ...
>     ...
>
>     _analyzer = new PerFieldAnalyzerWrapper(new CustomAnalyzer(), 
> analyzerMap);
>   }
>
>   @Override
>   public TokenStream tokenStream(String fieldname, Reader reader) {
>     TokenStream stream = _analyzer.tokenStream(fieldname, reader);
>     return stream;
>   }
> }
>
> In older versions of Lucene it is necessary to define a tokenStream 
> function, but in 4.0 it is not (in fact, TokenStream is declared 
> final, so you can't). Instead, it is necessary to define a 
> createComponents function that takes the same arguments as the 
> tokenStream function and returns a TokenStreamComponents object. The 
> TokenStreamComponents constructor has a Tokenizer argument and a 
> TokenStream argument. I assume I can just use the same code to provide 
> the TokenStream object as was used in the older analyzer's tokenStream 
> function, but I don't see how to provide a Tokenizer object, unless it 
> is by creating a separate map of field names to Tokenizers that works 
> the same way the analyzer map does. Is that the best way to do this, 
> or is there a better way? For example, would it be better to inherit 
> from AnalyzerWrapper instead of from Analyzer? In that case I would 
> need to define getWrappedAnalyzer and wrappedComponents functions. I 
> think in that case I would still need to put the same kind of logic in 
> the wrapComponents function that specifies which tokenizer to use with 
> which field, though. It looks like the PerFieldAnalyzerWrapper itself 
> assumes that the same tokenizer will be used with all fields, as its 
> wrapComponents function ignores the fieldname parameter. I would 
> appreciate any help in finding out the best way to update this analyzer and 
> to write the required function(s).

Thanks,
> Mike
>



--
Chris Male | Open Source Search Developer | elasticsearch | 
www.e<http://www.dutchworks.nl> lasticsearch.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Lucene 4.0 PerFieldAnalyzerWrapper question

Reply via email to