Hi Jong,

Jong Kim wrote:
> I'm looking for a stemmer that is capable of returning all morphological 
> variants  of a query term (to be used for high-recall search). For example, 
> given a query term of 'cares', I would like to be able to generate 'cares', 
> 'care', 'cared', and 'caring'.

To achieve high recall, can't you just stem terms both when indexing and
when querying?

The only reasons I can think of not to do so are to support both
high-precision and high-recall queries with the same index, or to give
greater weight to exact-match documents than to stemmed-match documents
within a single query.  If either of these is the case, you could
maintain two fields (one stemmed and one non-stemmed) or two indexes,
and choose which field/index to use (reason #1) or combine the two in a
single query (reason #2).

Actually, now that I think about it more, two indexes for reason #2
(i.e., stemmed match as fallback if exact match fails) would be tricky,
due to issues with fusion of search results -- better to use two fields
in a single index for this case.

Steve


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to