This is an automatically generated Delivery Status Notification.

Delivery to the following recipients failed.

       java-user@lucene.apache.org



Reporting-MTA: dns;av.mimer.no
Received-From-MTA: dns;av.mimer.no
Arrival-Date: Tue, 17 Oct 2006 06:52:24 +0200

Final-Recipient: rfc822;java-user@lucene.apache.org
Action: failed
Status: 5.5.0
Diagnostic-Code: smtp;550 SPF forgery: Please see http://www.openspf.org/why.html?sender=java-user%40lucene.apache.org&ip=213.184.200.35&receiver=asf.osuosl.org
--- Begin Message ---
Hi Jong,

Jong Kim wrote:
> I'm looking for a stemmer that is capable of returning all morphological 
> variants  of a query term (to be used for high-recall search). For example, 
> given a query term of 'cares', I would like to be able to generate 'cares', 
> 'care', 'cared', and 'caring'.

To achieve high recall, can't you just stem terms both when indexing and
when querying?

The only reasons I can think of not to do so are to support both
high-precision and high-recall queries with the same index, or to give
greater weight to exact-match documents than to stemmed-match documents
within a single query.  If either of these is the case, you could
maintain two fields (one stemmed and one non-stemmed) or two indexes,
and choose which field/index to use (reason #1) or combine the two in a
single query (reason #2).

Actually, now that I think about it more, two indexes for reason #2
(i.e., stemmed match as fallback if exact match fails) would be tricky,
due to issues with fusion of search results -- better to use two fields
in a single index for this case.

Steve


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--- End Message ---

Reply via email to