Re: Stemming Problem

2010-05-19 Thread Larry Hendrix
Thanks for the advice. I want to keep the capitalization because in our application we are mining specific contact and company names from news articles. About 99% of the time if we match a contact or company and it's capitalized we avoid false matches. --Larry On May 18, 2010, at 7:46 PM, Eric

Re: Stemming Problem

2010-05-18 Thread Erick Erickson
You can construct your own analyzer by creating it from a pre-existing Tokenizer (e.g. WhiteSpaceTokenizer) and any number of TokenfFilters (e.g. TokenFilter). You can string any number of TokenFilters together to get many different effects. But I have to ask, why you want to keep capitalization?

RE: Stemming Problem

2010-05-18 Thread Christopher Condit
Hi Larry- > Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having > problems with stemming. Does anyone have a recommendation for other > text analyzers that handle stemming and also keep capitalization, stop words, > and punctuation? Have you tried the SnowballFilter? You co

Stemming Problem

2010-05-18 Thread Larry Hendrix
Hi, Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having problems with stemming. Does anyone have a recommendation for other text analyzers that handle stemming and also keep capitalization, stop words, and punctuation? Thanks, Larry Larry A. Hendrix, Graduate Student C

Re: Porter stemming problem

2007-06-22 Thread Steven Rowe
Hi Rob, Robert Walpole wrote: > At the moment I am attempting to do this as follows... > > analyzer = new PorterStemAnalyzer(); > parser = new QueryParser("content", analyzer); > Query query = parser.parse("keywords: relaxing"); > Hits hits = idxSearcher.search(query); > > ...but this is not ret

Re: Porter stemming problem

2007-06-22 Thread Erick Erickson
Yes, you should also stem the query terms. Otherwise, you'll have indexed "working" as "work", but your search for "working" will look for "working" and won't match. Which is not what you want, I'm sure. Query.toString() will tell you a lot about how queries are processed, BTW In general, un

Porter stemming problem

2007-06-22 Thread Robert Walpole
Hi, I am using the PorterStemAnalyzer class (attached) to provide stemming for a Lucene index. To stem the terms in the index we use the following... //open an index writer in append mode IndexWriter idxWriter = new IndexWriter(LUCENE_INDEX_PATH, new PorterStemAnalyzer(), false); //add the luce