Hi, our application is indexing our logging events as documents. when the index reaches a limit, I want to delete the oldest 1 million events. since the number of events per day changes on a day to day basis, I cannot just delete blindly the last 3 days for instance. based on your different inputs I decided to query with a max = 1 million sorted by index order. I get the last document, get its timestamp, then delete based on a new query that includes a criteria on the timestamp field. this is good enough.
thanks all for you help, Vincent Chris Hostetter <hossman_luc...@fucit.org> 14.09.2011 22:04 Please respond to java-user@lucene.apache.org To java-user@lucene.apache.org, simon.willna...@gmail.com cc Subject Re: deleting with sorting and max document : can you provide your query which yields all the documents that you : want to delete? I don't understand how the sort order changes anything : here. if you want to only delete the top N docs of that query you : should maybe modify your query to only return those. I could imagine : you are returning the oldest first, if so can't you do a range filter : on top instead of sorting? i suspect the susinct problem description is something like "i want to only have the X newest docs that match query Q in my index, so i want to execute Q, find the total number of matches N, and then delete the first N-X docs matching Q when sorted by field F" Hypothetical example: a news aggregation site, with various contracts with other news sites that say things like "only allowed to redisplay at most 1000 articles from the NY Times at any one time" and the people running the site want to always include the 1000 newest NYT articles and delete the older ones. I suspect the most efficient way to deal with this would be to give every document a unique id that is garunteed to always increase. then decide how many docs you need to delete, and execute a query sorting on that id field asc using that num docs as the size of a TopSortedDocs, and find the id of the "newest" doc that you want to delete, then reformulate the query to include a range query on the id field with that value. if the num of docs to delete is too big to deal with TopSortedDocs, then paginate trough until you get the number you need. (you can do the same thing w/o the unique id using a date field, but you run the risk of overdeleting if multiple docs have the same date) -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org ************************ DISCLAIMER ************************ This message is intended only for use by the person to whom it is addressed. It may contain information that is privileged and confidential. Its content does not constitute a formal commitment by Lombard Odier Darier Hentsch & Cie or any of its branches or affiliates. If you are not the intended recipient of this message, kindly notify the sender immediately and destroy this message. Thank You. *****************************************************************