Re: Indexing of virtual "made up" documents

Erik Hatcher Tue, 26 Apr 2005 18:50:47 -0700

On Apr 26, 2005, at 4:46 PM, Paul Libbrecht wrote:

Le 26 avr. 05, à 15:00, Erik Hatcher a écrit :
I am not sure how Lucenes uses the placement information, but in the described case where I concatenate all my features to a whitespace-delimited text, I fear that Lucene uses the placement of features in this made-up text and comes to some wrong conclusions (after all, the placement is arbitrary in the "made-up" text).
What wrong conclusions do you fear here? Again, the position information is used for phrase queries, but in your situation you wouldn't be using phrase queries so no need to concern yourself with the position stuff at all.
There are some information retrieval settings which tend to say that things that appear early in the document should be considered with greater score... is there nothing such in Lucene's scoring ?

No, Lucene doesn't have that feature, at least not explicitly.... it could be hacked, sort of, by injecting multiple of the same term in the same position (to get a higher term frequency) for the earlier terms. Back to the original question - the position information will not adversely affect scoring.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Indexing of virtual "made up" documents

Reply via email to