Hi Mathieu,
You can't add TokenFilters to an existing Analyzer. However,
implementing an Analyzer that acts just like the StandardAnalyzer
plus your Stemmer is pretty straightforward.
StandardAnalzyer.tokenStream() looks like:
/** Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a [EMAIL
PROTECTED]
StandardFilter}, a [EMAIL PROTECTED] LowerCaseFilter} and a [EMAIL PROTECTED]
StopFilter}. */
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopSet);
//ADD your Stemming Filter here, or one line above if your Stop word
list works off of stemmed words
return result;
}
So just create a new Analyzer that has these same filters, plus your
stemming TokenFilter. Looking at the source of SnowballAnalyzer
(contrib/snowball) may also be useful.
FWIW, it is not that hard to make a "configurable" analyzer similar
to what Solr does, if you find you need to change the filters in your
analyzer a lot.
Cheers,
Grant
On Mar 5, 2007, at 1:25 PM, DECAFFMEYER MATHIEU wrote:
Hi,
This is a very simple question, but I just can't find the
ressources I need ...
I am using the StandardAnalyzer :
StandardAnalyzer stdAnalyzer;
if ((stopWordList != null) && (stopWordList.length != 0)) {
stdAnalyzer = new StandardAnalyzer(stopWordList);
} else {
stdAnalyzer = new StandardAnalyzer();
}
What I want to achive is be able to use an englsih stemmer,
But I can't find any methods to associate my stemmer to my Analayzer.
I appreciate any help, thank u.
__________________________________
Mathieu Decaffmeyer
Web Developer
Fortis Banque Luxembourg
50, avenue J. F. Kennedy
L-2951 Luxembourg
IS Retail Banking - Web Content Management
Mobile : 0032 479 / 69 . 42 . 96
============================================
Internet communications are not secure and therefore Fortis Banque
Luxembourg S.A. does not accept legal responsibility for the
contents of this message. The information contained in this e-mail
is confidential and may be legally privileged. It is intended
solely for the addressee. If you are not the intended recipient,
any disclosure, copying, distribution or any action taken or
omitted to be taken in reliance on it, is prohibited and may be
unlawful. Nothing in the message is capable or intended to create
any legally binding obligations on either party and it is not
intended to provide legal advice.
============================================
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ