Ganesh wrote:
My opinion is Stemming process is to get the base word. Here it is not
doing so.
Unfortunately this is where your problem lies, stemming doesn't do this,
it breaks words that are almost lexically equivalent down into a similar
root word. thus cat = cats.
From the wiki: "*Stemming* is the process for reducing inflected (or
sometimes derived) words to their stem
<http://en.wikipedia.org/wiki/Word_stem>, base or root
<http://en.wikipedia.org/wiki/Root_%28linguistics%29> form – generally a
written word form. The stem need not be identical to the morphological
root <http://en.wikipedia.org/wiki/Morphological_root> of the word; it
is usually sufficient that related words map to the same stem, even if
this stem is not in itself a valid root. The algorithm
<http://en.wikipedia.org/wiki/Algorithm> has been a long-standing
problem in computer science
<http://en.wikipedia.org/wiki/Computer_science>; the first paper on the
subject was published in 1968. The process of stemming, often called
*conflation <http://en.wikipedia.org/wiki/Conflation>*, is useful in
search engines <http://en.wikipedia.org/wiki/Search_engine> for query
expansion <http://en.wikipedia.org/wiki/Query_expansion> or indexing
<http://en.wikipedia.org/wiki/Index_%28search_engine%29> and other
natural language processing
<http://en.wikipedia.org/wiki/Natural_language_processing> problems."
But the words hard, and harder mean different things (In the opinion of
those who developed the Snowball algorithm), and as such shouldn't be
stemming down to a single word.
Now, I find it to be an arguable point about hard and harder not being
close enough to stem to the same root, but in order to get this effect
you will need to either change the snowball algorithm, or process your
words into a more base form before they go into the stemmed, which is a
hairy road indeed ^^
Hope this helps.
Matt
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207) 288-6012
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org