When you write your query, you can add a date range with a boot factor for this field, i.e boost y a factor x the documents that have a date of today, boost by x-1 the documents from the past wee, boost by x-2 the documents from the past two weeks, etc'.
This will not be a perfect sort on the dates but it will boost newer documents depends on your date range. HTH Aviran http://www.aviransplace.com -----Original Message----- From: Marcus Falck [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 2:43 PM To: java-user@lucene.apache.org Subject: Changing the scoring (newest doc date first) Hello, I'm working on a very large implementation of a search engine based on the lucene api (1.4.3). We have also been investigating enterprise search companies such as FAST and Verity but have come to the conclusion that we might aswell save ourselves 1 millon dollars by doing our own implementation on lucene. What we are talking about here is to index up data from alot of different system all containing ALOT of document. This index will be distributed by range ( date ) and scaled with 1 or more machines containing the same index per range (load balanced using round robin). Currently the total size of all documents we need to index is around 2TB (200 million documents) but this is growing with approximentely 200 000 document on a daily basis. I have already written code for a prototype that contains fetcher application, for fetching data from the orignal systems storages and distributes the documents using SOAP over TCP to the correct data intervall (and the intervalls machines), SearchMachineHost (the actual index/search per machine), Search/Index api (that adds transparancy to the whole clustering part), AlertHost (for time sensetive alerts) and demo applications. Every thing looks very good we are very satisfied with the performance. ---PROBLEM-- There is however one LARGE problem that we have run into. All search result should be displayed sorted with the newest document at top. We tried to accomplish this using Lucene's sort capabilites but quickly ran into large performance bottlenecks. So i figured since the default sort is by relevance i would like to change the relevance so that we don't even need to sort the documents. I guess alot of people at this mail list can give me valuable hints about how to accomplish this! (Since i now about the ability to sort by index id (which i haven't tried) I can also add that i can't guarantee that all documents will be added in correct date order (remember the several systems, the future plans is to buy content from different actors on the market and index it up). Please help me in my fight against FAST and Verity =D / Regards Marcus Falck, Stockholm, Sweden. I would also like to thank all people that have been involved in the lucene development. Very nice work! --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]