Hi Chris, So it sounds like instead of defining a new class that gets instantiated to create an analyzer, I could just do this:
public class MyPerFieldAnalyzer { public static Analyzer getMyPerFieldAnalyzer() { Map<String, Analyzer> analyzerMap = new HashMap<String, Analyzer>(); analyzerMap.put("IDNumber", new KeywordAnalyzer()); ... ... return new PerFieldAnalyzerWrapper(new CustomAnalyzer(), analyzerMap) ; } } Which is much simpler than all of the things I was thinking I would need to do. Thanks very much, Mike -----Original Message----- From: Chris Male [mailto:gento...@gmail.com] Sent: Tuesday, September 25, 2012 6:32 PM To: java-user@lucene.apache.org Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question Mike, On Wed, Sep 26, 2012 at 1:05 PM, Mike O'Leary <tmole...@uw.edu> wrote: > Hi Chris, > So if I change my analyzer to inherit from AnalyzerWrapper, I need to > define a getWrappedAnalyzer function and a wrapComponents function. I > think getWrappedAnalyzer is straightforward, but I don't understand > who is calling wrapComponents and for what purpose, so I don't know > how to define it. This is my modified analyzer code with ??? in the > places I don't know how to define. > Thanks, > Mike > > public class MyPerFieldAnalyzer extends AnalyzerWrapper { > Map<String, Analyzer> _analyzerMap = new HashMap<String, Analyzer>(); > Analyzer _defaultAnalyzer; > > public MyPerFieldAnalyzer() { > _analyzerMap.put("IDNumber", new KeywordAnalyzer()); > ... > ... > > _defaultAnalyzer = new CustomAnalyzer(); > } > > @Override > protected Analyzer getWrappedAnalyzer(String fieldName) { > Analyzer analyzer; > > if (analyzerMap.containsKey(fieldName) { > analyzer = analyzerMap.get(fieldName); > } else { > analyzer = defaultAnalyzer; > } > } > I'm not sure if you have missed it but PerFieldAnalyzerWrapper supports having a default Analyzer. > > @Override > public TokenStreamComponents wrapComponents(String fieldname, > TokenStreamComponents components) { > Tokenizer tokenizer = ???; > TokenStream tokenStream = ???; > return new TokenStreamComponents(tokenizer, tokenStream); > } > } > wrapComponents is useful for when you need to change the components retrieved from the wrapped Analyzer. Adding a new Tokenizer or TokenFilter for example. But you don't need to do this, and can just return the components parameter unchanged. > > -----Original Message----- > From: Chris Male [mailto:gento...@gmail.com] > Sent: Tuesday, September 25, 2012 5:34 PM > To: java-user@lucene.apache.org > Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question > > Ah I see. > > The problem is that we don't really encourage wrapping of Analyzers. > Your Analyzer wraps a PerFieldAnalyzerWrapper consequently it needs to > extend AnalyzerWrapper, not Analyzer. AnalyzerWrapper handles the > createComponents call and just requires you to give it the Analyzer(s) > you've wrapped through getWrappedAnalyzer. > > You can avoid all this entirely of course by not extending Analyzer > but instead just instantiating a PerFieldAnalyerWrapper instance > directly instead of your MyPerFieldAnalyzer. > > On Wed, Sep 26, 2012 at 12:25 PM, Mike O'Leary <tmole...@uw.edu> wrote: > > > Hi Chris, > > In a nutshell, my question is, what should I put in place of ??? to > > make this into a Lucene 4.0 analyzer? > > > > public class MyPerFieldAnalyzer extends Analyzer { > > PerFieldAnalyzerWrapper _analyzer; > > > > public MyPerFieldAnalyzer() { > > Map<String, Analyzer> analyzerMap = new HashMap<String, > > Analyzer>(); > > > > analyzerMap.put("IDNumber", new KeywordAnalyzer()); > > ... > > ... > > > > _analyzer = new PerFieldAnalyzerWrapper(new CustomAnalyzer(), > > analyzerMap); > > } > > > > @Override > > public TokenStreamComponents createComponents(String fieldname, > > Reader > > reader) { > > Tokenizer source = ???; > > TokenStream stream = _analyzer.tokenStream(fieldname, reader); > > return new TokenStreamComponents(source, stream); > > } > > } > > > > I must be missing something obvious. Can you tell me what it is? > > Thanks, > > Mike > > > > -----Original Message----- > > From: Chris Male [mailto:gento...@gmail.com] > > Sent: Tuesday, September 25, 2012 5:18 PM > > To: java-user@lucene.apache.org > > Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question > > > > Hi Mike, > > > > I don't really understand what problem you're having. > > > > PerFieldAnalyzerWrapper, like all AnalyzerWrappers, uses > > Analyzer.PerFieldReuseStrategy which means it caches the > > TokenStreamComponents per field. The TokenStreamComponents cached > > are created by by retrieving the wrapped Analyzer through > > AnalyzerWrapper.getWrappedAnalyzer(Field) and calling createComponents. > > In PerFieldAnalyzerWrapper, getWrappedAnalyzer pulls the Analyzer > > from the Map you provide. > > > > Consequently to use your custom Analyzers and KeywordAnalyzer, all > > you need to do is define your custom Analyzer using the new Analyzer > > API (that is using TokenStreamComponents), create your Map from that > > Analyzer and KeywordAnalyzer and pass it into PerFieldAnalyzerWrapper. > > This seems to be what you're doing in your code sample. > > > > Are you able to expand on the problem you're encountering? > > > > On Wed, Sep 26, 2012 at 11:57 AM, Mike O'Leary <tmole...@uw.edu> wrote: > > > > > I am updating an analyzer that uses a particular configuration of > > > the PerFieldAnalyzerWrapper to work with Lucene 4.0. A few of the > > > fields use a custom analyzer and StandardTokenizer and the other > > > fields use the KeywordAnalyzer and KeywordTokenizer. The older > > > version of the analyzer looks like this: > > > > > > public class MyPerFieldAnalyzer extends Analyzer { > > > PerFieldAnalyzerWrapper _analyzer; > > > > > > public MyPerFieldAnalyzer() { > > > Map<String, Analyzer> analyzerMap = new HashMap<String, > > > Analyzer>(); > > > > > > analyzerMap.put("IDNumber", new KeywordAnalyzer()); > > > ... > > > ... > > > > > > _analyzer = new PerFieldAnalyzerWrapper(new CustomAnalyzer(), > > > analyzerMap); > > > } > > > > > > @Override > > > public TokenStream tokenStream(String fieldname, Reader reader) { > > > TokenStream stream = _analyzer.tokenStream(fieldname, reader); > > > return stream; > > > } > > > } > > > > > > In older versions of Lucene it is necessary to define a > > > tokenStream function, but in 4.0 it is not (in fact, TokenStream > > > is declared final, so you can't). Instead, it is necessary to > > > define a createComponents function that takes the same arguments > > > as the tokenStream function and returns a TokenStreamComponents > > > object. The TokenStreamComponents constructor has a Tokenizer > > > argument and a TokenStream argument. I assume I can just use the > > > same code to provide the TokenStream object as was used in the > > > older analyzer's tokenStream function, but I don't see how to > > > provide a Tokenizer object, unless it is by creating a separate > > > map of field names to Tokenizers that works the same way the > > > analyzer map does. Is that the best way to do this, or is there a > > > better way? For example, would it be better to inherit from > > > AnalyzerWrapper instead of from Analyzer? In that case I would > > > need to define getWrappedAnalyzer and wrappedComponents functions. > > > I think in that case I would still need to put the same kind of > > > logic in the wrapComponents function that specifies which > > > tokenizer to use with which field, though. It looks like the > > > PerFieldAnalyzerWrapper itself assumes that the same tokenizer > > > will be used with all fields, as its wrapComponents function > > > ignores the fieldname parameter. I would appreciate any help in > > > finding out the best way to update this analyzer > > and to write the required function(s). > > > > Thanks, > > > Mike > > > > > > > > > > > -- > > Chris Male | Open Source Search Developer | elasticsearch | www.e< > > http://www.dutchworks.nl> lasticsearch.com > > > > -------------------------------------------------------------------- > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > -- > Chris Male | Open Source Search Developer | elasticsearch | www.e< > http://www.dutchworks.nl> lasticsearch.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Chris Male | Open Source Search Developer | elasticsearch | www.e<http://www.dutchworks.nl> lasticsearch.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org