Hi Gunther. We thought in the terms of an index containing the search profiles and search that index using the documents as a query. But we couldn't really figure it out. We have an alert service up and running today using Veritys implementation of alerts. So we looked at the Verity documentation and realised that they didn't handle the alert using an inverted index. So we implemented our new alert service in the same way the verity service works today. Which seems to work nice, but if you have any concrete solution on how to achive an inverted index storing pretty complex queries you are more then welcome to share it. - What I want to accomplish is an central index for alot of large backend systems containing a lot of articles. For example news polled from web, newspapers delivered in electronic form to us and 3:d part document databases. So what we have done is to implement a search engine using Lucene as the core. This engine is scalable both in terms of range and round-robin/range. Fetcher applications fetches documents from different storages and transforms those documents into a more common format and then distributes them to all searchmachines matching that range. The range clustering is built using date range. Since we are going to buy document databases from other companies we can't guarantee that all data will be added in terms of date order. The volymes of data we are talking about are around 500 Million news articles. The enduser, and alot of our internal processes for value adding services, are then defining a search query for things they want to monitor. In the endusers case this is called "agent". When the user logs in to the system and clicks on its agent the user will get the matching articles presented to him/her in DATE order (newest first). The date order is critical. The relevance is not important since we have value added services such as quality control of the hits. So the last thing to do in order to get a fully functional prof of concept up is to fix the date order presentation. And since it's alot of data and the IndexSearcher will be recreated pretty often we will need to change the lucene scoring/ranking. And I can't understand why this should be so hard? But I don't have any clue of what the best practises for doing so are. / Regards Marcus
________________________________ Från: Günther Starnberger [mailto:[EMAIL PROTECTED] Skickat: to 2006-05-18 23:22 Till: java-user@lucene.apache.org Ämne: Re: SV: Sort problematics On Thu, May 18, 2006 at 10:53:23PM +0200, Marcus Falck wrote: Hello, > The term scorer will give higher score on documents containing both > terms. This is a problem (in our application) since in this case want > the same score on documents as long as they contain 1 of the terms > (since we are dealing with newsletter observation for companies they > want to get the hits ordered by date to get the complete overview). I > tested to rewrite the TermScorer to give me the same score with > success. So my question is. What exactly do you want to achieve with your application? You speak of "immediate alerts". I understand this as: Your users specify keywords or queries and when you receive a new document which matches a query you alert the user. Is this what you want to do? If so I don't think that Lucene is useful for this kind of realtime queries. Instead of using an inverted index it would make more sense to use a normal index which contains the terms you search for. If you receive a new document make a lookup on each term of the document using the index. It _might_ be possible to do this with Lucene by storing the search-terms as documents and using the documents which you receive as queries, but i guess this it isn't that trivial. If you need a combination of traditional search and real-time alerts a hybrid solution may make sense. But using Lucene for real-time search isn't a good idea (at least IMO). bye, /gst
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]