Mike, On Wed, Sep 26, 2012 at 1:05 PM, Mike O'Leary <tmole...@uw.edu> wrote:
> Hi Chris, > So if I change my analyzer to inherit from AnalyzerWrapper, I need to > define a getWrappedAnalyzer function and a wrapComponents function. I think > getWrappedAnalyzer is straightforward, but I don't understand who is > calling wrapComponents and for what purpose, so I don't know how to define > it. This is my modified analyzer code with ??? in the places I don't know > how to define. > Thanks, > Mike > > public class MyPerFieldAnalyzer extends AnalyzerWrapper { > Map<String, Analyzer> _analyzerMap = new HashMap<String, Analyzer>(); > Analyzer _defaultAnalyzer; > > public MyPerFieldAnalyzer() { > _analyzerMap.put("IDNumber", new KeywordAnalyzer()); > ... > ... > > _defaultAnalyzer = new CustomAnalyzer(); > } > > @Override > protected Analyzer getWrappedAnalyzer(String fieldName) { > Analyzer analyzer; > > if (analyzerMap.containsKey(fieldName) { > analyzer = analyzerMap.get(fieldName); > } else { > analyzer = defaultAnalyzer; > } > } > I'm not sure if you have missed it but PerFieldAnalyzerWrapper supports having a default Analyzer. > > @Override > public TokenStreamComponents wrapComponents(String fieldname, > TokenStreamComponents components) { > Tokenizer tokenizer = ???; > TokenStream tokenStream = ???; > return new TokenStreamComponents(tokenizer, tokenStream); > } > } > wrapComponents is useful for when you need to change the components retrieved from the wrapped Analyzer. Adding a new Tokenizer or TokenFilter for example. But you don't need to do this, and can just return the components parameter unchanged. > > -----Original Message----- > From: Chris Male [mailto:gento...@gmail.com] > Sent: Tuesday, September 25, 2012 5:34 PM > To: java-user@lucene.apache.org > Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question > > Ah I see. > > The problem is that we don't really encourage wrapping of Analyzers. Your > Analyzer wraps a PerFieldAnalyzerWrapper consequently it needs to extend > AnalyzerWrapper, not Analyzer. AnalyzerWrapper handles the > createComponents call and just requires you to give it the Analyzer(s) > you've wrapped through getWrappedAnalyzer. > > You can avoid all this entirely of course by not extending Analyzer but > instead just instantiating a PerFieldAnalyerWrapper instance directly > instead of your MyPerFieldAnalyzer. > > On Wed, Sep 26, 2012 at 12:25 PM, Mike O'Leary <tmole...@uw.edu> wrote: > > > Hi Chris, > > In a nutshell, my question is, what should I put in place of ??? to > > make this into a Lucene 4.0 analyzer? > > > > public class MyPerFieldAnalyzer extends Analyzer { > > PerFieldAnalyzerWrapper _analyzer; > > > > public MyPerFieldAnalyzer() { > > Map<String, Analyzer> analyzerMap = new HashMap<String, > > Analyzer>(); > > > > analyzerMap.put("IDNumber", new KeywordAnalyzer()); > > ... > > ... > > > > _analyzer = new PerFieldAnalyzerWrapper(new CustomAnalyzer(), > > analyzerMap); > > } > > > > @Override > > public TokenStreamComponents createComponents(String fieldname, > > Reader > > reader) { > > Tokenizer source = ???; > > TokenStream stream = _analyzer.tokenStream(fieldname, reader); > > return new TokenStreamComponents(source, stream); > > } > > } > > > > I must be missing something obvious. Can you tell me what it is? > > Thanks, > > Mike > > > > -----Original Message----- > > From: Chris Male [mailto:gento...@gmail.com] > > Sent: Tuesday, September 25, 2012 5:18 PM > > To: java-user@lucene.apache.org > > Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question > > > > Hi Mike, > > > > I don't really understand what problem you're having. > > > > PerFieldAnalyzerWrapper, like all AnalyzerWrappers, uses > > Analyzer.PerFieldReuseStrategy which means it caches the > > TokenStreamComponents per field. The TokenStreamComponents cached are > > created by by retrieving the wrapped Analyzer through > > AnalyzerWrapper.getWrappedAnalyzer(Field) and calling createComponents. > > In PerFieldAnalyzerWrapper, getWrappedAnalyzer pulls the Analyzer > > from the Map you provide. > > > > Consequently to use your custom Analyzers and KeywordAnalyzer, all you > > need to do is define your custom Analyzer using the new Analyzer API > > (that is using TokenStreamComponents), create your Map from that > > Analyzer and KeywordAnalyzer and pass it into PerFieldAnalyzerWrapper. > > This seems to be what you're doing in your code sample. > > > > Are you able to expand on the problem you're encountering? > > > > On Wed, Sep 26, 2012 at 11:57 AM, Mike O'Leary <tmole...@uw.edu> wrote: > > > > > I am updating an analyzer that uses a particular configuration of > > > the PerFieldAnalyzerWrapper to work with Lucene 4.0. A few of the > > > fields use a custom analyzer and StandardTokenizer and the other > > > fields use the KeywordAnalyzer and KeywordTokenizer. The older > > > version of the analyzer looks like this: > > > > > > public class MyPerFieldAnalyzer extends Analyzer { > > > PerFieldAnalyzerWrapper _analyzer; > > > > > > public MyPerFieldAnalyzer() { > > > Map<String, Analyzer> analyzerMap = new HashMap<String, > > > Analyzer>(); > > > > > > analyzerMap.put("IDNumber", new KeywordAnalyzer()); > > > ... > > > ... > > > > > > _analyzer = new PerFieldAnalyzerWrapper(new CustomAnalyzer(), > > > analyzerMap); > > > } > > > > > > @Override > > > public TokenStream tokenStream(String fieldname, Reader reader) { > > > TokenStream stream = _analyzer.tokenStream(fieldname, reader); > > > return stream; > > > } > > > } > > > > > > In older versions of Lucene it is necessary to define a tokenStream > > > function, but in 4.0 it is not (in fact, TokenStream is declared > > > final, so you can't). Instead, it is necessary to define a > > > createComponents function that takes the same arguments as the > > > tokenStream function and returns a TokenStreamComponents object. The > > > TokenStreamComponents constructor has a Tokenizer argument and a > > > TokenStream argument. I assume I can just use the same code to > > > provide the TokenStream object as was used in the older analyzer's > > > tokenStream function, but I don't see how to provide a Tokenizer > > > object, unless it is by creating a separate map of field names to > > > Tokenizers that works the same way the analyzer map does. Is that > > > the best way to do this, or is there a better way? For example, > > > would it be better to inherit from AnalyzerWrapper instead of from > > > Analyzer? In that case I would need to define getWrappedAnalyzer and > > > wrappedComponents functions. I think in that case I would still need > > > to put the same kind of logic in the wrapComponents function that > > > specifies which tokenizer to use with which field, though. It looks > > > like the PerFieldAnalyzerWrapper itself assumes that the same > > > tokenizer will be used with all fields, as its wrapComponents > > > function ignores the fieldname parameter. I would appreciate any > > > help in finding out the best way to update this analyzer > > and to write the required function(s). > > > > Thanks, > > > Mike > > > > > > > > > > > -- > > Chris Male | Open Source Search Developer | elasticsearch | www.e< > > http://www.dutchworks.nl> lasticsearch.com > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > -- > Chris Male | Open Source Search Developer | elasticsearch | www.e< > http://www.dutchworks.nl> lasticsearch.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Chris Male | Open Source Search Developer | elasticsearch | www.e<http://www.dutchworks.nl> lasticsearch.com