Re: Using Stemmers

Grant Ingersoll Mon, 05 Mar 2007 14:44:28 -0800

Hi Mathieu,

You can't add TokenFilters to an existing Analyzer. However,implementing an Analyzer that acts just like the StandardAnalyzerplus your Stemmer is pretty straightforward.StandardAnalzyer.tokenStream() looks like:

/** Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a [EMAIL 
PROTECTED]

StandardFilter}, a [EMAIL PROTECTED] LowerCaseFilter} and a [EMAIL PROTECTED]StopFilter}. */

  public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new StandardTokenizer(reader);
    result = new StandardFilter(result);
    result = new LowerCaseFilter(result);
    result = new StopFilter(result, stopSet);

//ADD your Stemming Filter here, or one line above if your Stop wordlist works off of stemmed words

    return result;
  }

So just create a new Analyzer that has these same filters, plus yourstemming TokenFilter. Looking at the source of SnowballAnalyzer(contrib/snowball) may also be useful.

FWIW, it is not that hard to make a "configurable" analyzer similarto what Solr does, if you find you need to change the filters in youranalyzer a lot.


Cheers,
Grant


On Mar 5, 2007, at 1:25 PM, DECAFFMEYER MATHIEU wrote:

Hi,
This is a very simple question, but I just can't find theressources I need ...
I am using the StandardAnalyzer :
StandardAnalyzer stdAnalyzer;
if ((stopWordList != null) && (stopWordList.length != 0)) {
stdAnalyzer = new StandardAnalyzer(stopWordList);
} else {
stdAnalyzer = new StandardAnalyzer();
}
What I want to achive is be able to use an englsih stemmer,
But I can't find any methods to associate my stemmer to my Analayzer.
I appreciate any help, thank u.

__________________________________

   Mathieu Decaffmeyer
   Web Developer
   Fortis Banque Luxembourg
   50, avenue J. F. Kennedy
   L-2951 Luxembourg
   IS Retail Banking - Web Content Management
   Mobile : 0032  479 / 69 . 42 . 96



============================================
Internet communications are not secure and therefore Fortis BanqueLuxembourg S.A. does not accept legal responsibility for thecontents of this message. The information contained in this e-mailis confidential and may be legally privileged. It is intendedsolely for the addressee. If you are not the intended recipient,any disclosure, copying, distribution or any action taken oromitted to be taken in reliance on it, is prohibited and may beunlawful. Nothing in the message is capable or intended to createany legally binding obligations on either party and it is notintended to provide legal advice.
============================================
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/LuceneFAQ

Re: Using Stemmers

Reply via email to