Hi, Regarding Uwe's warning,
"NOTE: SnowballFilter expects lowercased text." [1] [1] https://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/snowball/SnowballFilter.html On Monday, November 10, 2014 4:43 PM, Uwe Schindler <u...@thetaphi.de> wrote: Hi, > Uwe > > Thanks for the reply. Given that SnowBallAnalyzer is made up of a series of > filters, I was thinking about something like this where I 'pipe' output from > one filter to the next: > > standardTokenizer =new StandardTokenizer (...); standardFilter = new > StandardFilter(standardTokenizer,...); > stopFilter = new StopFilter(standardFilter,...); snowballFilter = new > SnowballFilter(stopFilter,...); > > But ignore LowerCaseFilter. Does this make sense? Exactly. Create a clone of SnowballAnalyzer (from Lucene source package) in your own package and remove LowercaseFilter. But be aware, it could be that snowball needs lowercased terms to correctly do stemming!!! I don't know about this filter, I just want to make you aware. The same applies to stop filter, but this one allows to handle that: You should make stop-filter case insensitive (there is a boolean to do this): StopFilter(boolean enablePositionIncrements, TokenStream input, Set<?> stopWords, boolean ignoreCase) Uwe > Martin O'Shea. > -----Original Message----- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: 10 Nov 2014 14 06 > To: java-user@lucene.apache.org > Subject: RE: How to disable LowerCaseFilter when using SnowballAnalyzer in > Lucene 3.0.2 > > Hi, > > In general, you cannot change Analyzers, they are "examples" and can be > seen as "best practise". If you want to modify them, write your own Analyzer > subclass which uses the wanted Tokenizers and TokenFilters as you like. You > can for example clone the source code of the original and remove > LowercaseFilter. Analyzers are very simple, there is no logic in them, it's > just > some "configuration" (which Tokenizer and which TokenFilters). In later > Lucene 3 and Lucene 4, this is very simple: You just need to override > createComponents in Analyzer class and add your "configuration" there. > > If you use Apache Solr or Elasticsearch you can create your analyzers by XML > or JSON configuration. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: Martin O'Shea [mailto:m.os...@dsl.pipex.com] > > Sent: Monday, November 10, 2014 2:54 PM > > To: java-user@lucene.apache.org > > Subject: How to disable LowerCaseFilter when using SnowballAnalyzer in > > Lucene 3.0.2 > > > > I realise that 3.0.2 is an old version of Lucene but if I have Java > > code as > > follows: > > > > > > > > int nGramLength = 3; > > > > Set<String> stopWords = new Set<String>(); > > > > stopwords.add("the"); > > > > stopwords.add("and"); > > > > ... > > > > SnowballAnalyzer snowballAnalyzer = new > > SnowballAnalyzer(Version.LUCENE_30, > > "English", stopWords); > > > > ShingleAnalyzerWrapper shingleAnalyzer = new > > ShingleAnalyzerWrapper(snowballAnalyzer, nGramLength); > > > > > > > > Which will generate the frequency of ngrams from a particular a string > > of text without stop words, how can I disable the LowerCaseFilter > > which forms part of the SnowBallAnalyzer? I want to preserve the case > > of the ngrams generated so that I can perform various counts according > > to the presence / absence of upper case characters in the ngrams. > > > > > > > > I am something of a Lucene newbie. And I should add that upgrading the > > version of Lucene is not an option here. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org