Andrzej, I have been trying to solve a similar problem where I need to boost score based on the document type. Your approach is very interesting and I want to give it a try.
I have a implementation specific question. When you mention to put as many "1" as the boost need to be, do you mean that the resultant field should look like "1 1 1 1 1" or "1,1,1,1,1" so that the content is tokenized and indexed? Suman On 12/21/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Martin Braun wrote: > Hi Daniel, > > >>> so a doc from 1973 should get a boost of 1.1973 and a doc of 1975 should >>> get a boost of 1.1975 . >>> >> The boost is stored with a limited resolution. Try boosting one doc by 10, >> the other one by 20 or something like that. >> > > You're right. I thought that with the float values the resolution should > be good enough! > But there is only a difference in the score with a boosting diff of 0.2 > (e.g. 1.7 and 1.9). > > I know that there were many questions on the list regarding scoring > better new documents. > But I want to avoid any overhead like "FunctionQuery" at query time, > and in my case I have some documents > which have same values in many fields (=>same score) and the only > difference is the year. > > However I don't want to overboost the score so that the scoring for > other criteria is not considered. > > Shortly spoken: As a result of a search I have a list of book titles and > I want a sort by score AND by year of publication. > > But for performance reasons I want to avoid this sorting at query-time > by boosting at index time. > > Is that possible? > Here's the trick that works for me, without the issues of boost resolution or FunctionQuery. Add a separate field, say "days", in which you will put as many "1" as many days elapsed since the epoch (not neccessarily since 1 Jan 1970 - pick a date that makes sense for you). Then, if you want to prioritize newer documents, just add "+days:1" to your query. Voila - the final results are a sum of other score factors plus a score factor that is higher for more recent document, containing more 1-s. If you are dealing with large time spans, you can split this into years and days-in-a-year, and apply query boosts, like "+years:1^10.0 +days:1^0.02". Do some experiments and find what works best for you. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]