Thanks Uwe. -----Original Message----- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: 10 Nov 2014 14 43 To: java-user@lucene.apache.org Subject: RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2
Hi, > Uwe > > Thanks for the reply. Given that SnowBallAnalyzer is made up of a > series of filters, I was thinking about something like this where I > 'pipe' output from one filter to the next: > > standardTokenizer =new StandardTokenizer (...); standardFilter = new > StandardFilter(standardTokenizer,...); > stopFilter = new StopFilter(standardFilter,...); snowballFilter = new > SnowballFilter(stopFilter,...); > > But ignore LowerCaseFilter. Does this make sense? Exactly. Create a clone of SnowballAnalyzer (from Lucene source package) in your own package and remove LowercaseFilter. But be aware, it could be that snowball needs lowercased terms to correctly do stemming!!! I don't know about this filter, I just want to make you aware. The same applies to stop filter, but this one allows to handle that: You should make stop-filter case insensitive (there is a boolean to do this): StopFilter(boolean enablePositionIncrements, TokenStream input, Set<?> stopWords, boolean ignoreCase) Uwe > Martin O'Shea. > -----Original Message----- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: 10 Nov 2014 14 06 > To: java-user@lucene.apache.org > Subject: RE: How to disable LowerCaseFilter when using > SnowballAnalyzer in Lucene 3.0.2 > > Hi, > > In general, you cannot change Analyzers, they are "examples" and can > be seen as "best practise". If you want to modify them, write your own > Analyzer subclass which uses the wanted Tokenizers and TokenFilters as > you like. You can for example clone the source code of the original > and remove LowercaseFilter. Analyzers are very simple, there is no > logic in them, it's just some "configuration" (which Tokenizer and > which TokenFilters). In later Lucene 3 and Lucene 4, this is very > simple: You just need to override createComponents in Analyzer class and add > your "configuration" there. > > If you use Apache Solr or Elasticsearch you can create your analyzers > by XML or JSON configuration. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: Martin O'Shea [mailto:m.os...@dsl.pipex.com] > > Sent: Monday, November 10, 2014 2:54 PM > > To: java-user@lucene.apache.org > > Subject: How to disable LowerCaseFilter when using SnowballAnalyzer > > in Lucene 3.0.2 > > > > I realise that 3.0.2 is an old version of Lucene but if I have Java > > code as > > follows: > > > > > > > > int nGramLength = 3; > > > > Set<String> stopWords = new Set<String>(); > > > > stopwords.add("the"); > > > > stopwords.add("and"); > > > > ... > > > > SnowballAnalyzer snowballAnalyzer = new > > SnowballAnalyzer(Version.LUCENE_30, > > "English", stopWords); > > > > ShingleAnalyzerWrapper shingleAnalyzer = new > > ShingleAnalyzerWrapper(snowballAnalyzer, nGramLength); > > > > > > > > Which will generate the frequency of ngrams from a particular a > > string of text without stop words, how can I disable the > > LowerCaseFilter which forms part of the SnowBallAnalyzer? I want to > > preserve the case of the ngrams generated so that I can perform > > various counts according to the presence / absence of upper case characters > > in the ngrams. > > > > > > > > I am something of a Lucene newbie. And I should add that upgrading > > the version of Lucene is not an option here. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org