I realise that 3.0.2 is an old version of Lucene but if I have Java code as
follows:

 

int nGramLength = 3;

Set<String> stopWords = new Set<String>();

stopwords.add("the");

stopwords.add("and");

...

SnowballAnalyzer snowballAnalyzer = new SnowballAnalyzer(Version.LUCENE_30,
"English", stopWords);                  

ShingleAnalyzerWrapper shingleAnalyzer = new
ShingleAnalyzerWrapper(snowballAnalyzer, nGramLength);

 

Which will generate the frequency of ngrams from a particular a string of
text without stop words, how can I disable the LowerCaseFilter which forms
part of the SnowBallAnalyzer? I want to preserve the case of the ngrams
generated so that I can perform various counts according to the presence /
absence of upper case characters in the ngrams.

 

I am something of a Lucene newbie. And I should add that upgrading the
version of Lucene is not an option here.

Reply via email to