RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

Martin O'Shea Mon, 10 Nov 2014 06:20:08 -0800

Uwe

Thanks for the reply. Given that SnowBallAnalyzer is made up of a series of 
filters, I was thinking about something like this where I 'pipe' output from 
one filter to the next:


standardTokenizer =new StandardTokenizer (...);
standardFilter = new StandardFilter(standardTokenizer,...);
stopFilter = new StopFilter(standardFilter,...);
snowballFilter = new SnowballFilter(stopFilter,...);

But ignore LowerCaseFilter. Does this make sense?

Thanks

Martin O'Shea.
-----Original Message-----
From: Uwe Schindler [mailto:u...@thetaphi.de] 
Sent: 10 Nov 2014 14 06
To: java-user@lucene.apache.org
Subject: RE: How to disable LowerCaseFilter when using SnowballAnalyzer in 
Lucene 3.0.2

Hi,

In general, you cannot change Analyzers, they are "examples" and can be seen as 
"best practise". If you want to modify them, write your own Analyzer subclass 
which uses the wanted Tokenizers and TokenFilters as you like. You can for 
example clone the source code of the original and remove LowercaseFilter. 
Analyzers are very simple, there is no logic in them, it's just some 
"configuration" (which Tokenizer and which TokenFilters). In later Lucene 3 and 
Lucene 4, this is very simple: You just need to override createComponents in 
Analyzer class and add your "configuration" there.

If you use Apache Solr or Elasticsearch you can create your analyzers by XML or 
JSON configuration.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Martin O'Shea [mailto:m.os...@dsl.pipex.com]
> Sent: Monday, November 10, 2014 2:54 PM
> To: java-user@lucene.apache.org
> Subject: How to disable LowerCaseFilter when using SnowballAnalyzer in 
> Lucene 3.0.2
> 
> I realise that 3.0.2 is an old version of Lucene but if I have Java 
> code as
> follows:
> 
> 
> 
> int nGramLength = 3;
> 
> Set<String> stopWords = new Set<String>();
> 
> stopwords.add("the");
> 
> stopwords.add("and");
> 
> ...
> 
> SnowballAnalyzer snowballAnalyzer = new 
> SnowballAnalyzer(Version.LUCENE_30,
> "English", stopWords);
> 
> ShingleAnalyzerWrapper shingleAnalyzer = new 
> ShingleAnalyzerWrapper(snowballAnalyzer, nGramLength);
> 
> 
> 
> Which will generate the frequency of ngrams from a particular a string 
> of text without stop words, how can I disable the LowerCaseFilter 
> which forms part of the SnowBallAnalyzer? I want to preserve the case 
> of the ngrams generated so that I can perform various counts according 
> to the presence / absence of upper case characters in the ngrams.
> 
> 
> 
> I am something of a Lucene newbie. And I should add that upgrading the 
> version of Lucene is not an option here.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

Reply via email to