Re: lucene suggest

Karl Wettin Wed, 22 Aug 2007 05:38:38 -0700


21 aug 2007 kl. 13.10 skrev Jens Grivolla:

On 8/21/07, Heba Farouk <[EMAIL PROTECTED]> wrote:
the documents are not duplicated, i mean the hits (assume that 2documents have the same subject but with different authors, so ifi'm searching the subject, the returned hits will have duplicates )
i was asking if i can remove duplicates from the hits??
You may not want to work with documents at all (where you have the
duplicates), but rather with the terms in your index directly.  Take a
look at WildcardTermEnum etc.

My favorite solution for this is a stand alone trie, and such asolution is available in LUCENE-625.


Another way is to create an ngram-index.

It is usually a good idea to create an "a priori" corpus with alimited set of data. I prefere common user queries rather than itemsin the index. Especially if your corpus is large and you have a lotof server load.

Try LUCENE-550 as a priori index. My guess is that it wouldoutperform a RAMDirectory 20x at 25,000 title-sized (40 chars avg)documents.



--
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene suggest

Reply via email to