Thanks for the advice. I want to keep the capitalization because in our
application we are mining specific contact and company names from news
articles. About 99% of the time if we match a contact or company and it's
capitalized we avoid false matches.
--Larry
On May 18, 2010, at 7:46 PM, Eric
You can construct your own analyzer by creating
it from a pre-existing Tokenizer
(e.g. WhiteSpaceTokenizer) and any number
of TokenfFilters (e.g. TokenFilter). You can
string any number of TokenFilters together
to get many different effects.
But I have to ask, why you want to keep capitalization?
Hi Larry-
> Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having
> problems with stemming. Does anyone have a recommendation for other
> text analyzers that handle stemming and also keep capitalization, stop words,
> and punctuation?
Have you tried the SnowballFilter? You co
Hi,
Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having
problems with stemming. Does anyone have a recommendation for other text
analyzers that handle stemming and also keep capitalization, stop words, and
punctuation?
Thanks,
Larry
Larry A. Hendrix, Graduate Student
C
Hi Rob,
Robert Walpole wrote:
> At the moment I am attempting to do this as follows...
>
> analyzer = new PorterStemAnalyzer();
> parser = new QueryParser("content", analyzer);
> Query query = parser.parse("keywords: relaxing");
> Hits hits = idxSearcher.search(query);
>
> ...but this is not ret
Yes, you should also stem the query terms. Otherwise, you'll have
indexed "working" as "work", but your search for "working" will look
for "working" and won't match. Which is not what you want, I'm sure.
Query.toString() will tell you a lot about how queries are
processed, BTW
In general, un
Hi,
I am using the PorterStemAnalyzer class (attached) to provide stemming
for a Lucene index.
To stem the terms in the index we use the following...
//open an index writer in append mode
IndexWriter idxWriter = new IndexWriter(LUCENE_INDEX_PATH, new
PorterStemAnalyzer(), false);
//add the luce