Hi! I've got exactly the same problem. Maybe it is possible to extend the previously discussed patch to fragment the fields of one document into separate files to actually allow updating only one fragment? Then, updating frequently changing fields (like ACLs or other meta data, maybe even a PageRank value for Nutch?), would be cheaper. This would also allow to easily 'render' ACLs on the documents they influence while changing the ACLs. After all, you don't change ACLs as often as you access documents. I guess this would be hard, as the lexicon is stored elsewhere... Any ideas? It would of course be even better to properly separate these in different indices and be able to map document id's across them. Updating would be rather simple, and retrieval may be done in parallel. Maybe a custom RelationalMultiSearcher would be in order?
I've also thought about combining document and field based fragmentation strategies. Since we need subsecond search and update performance of a multi-million document index in the near future, this seems the way to go. Hardware would not really be an issue here, but of course we want to be efficient, especially in a multi-processor environment. Have there been any thoughts about this? Best regards, Max Pfingsthorn Hippo Oosteinde 11 1017WT Amsterdam The Netherlands Tel +31 (0)20 5224466 ------------------------------------------------------------- [EMAIL PROTECTED] / www.hippo.nl -------------------------------------------------------------- -----Original Message----- From: Markus Wiederkehr [mailto:[EMAIL PROTECTED] Sent: Monday, May 30, 2005 09:47 To: Lucene users Subject: ACLs and Lucene I am working on a Document Management System where every document has an Access Control List attached to it. Obviously a search result should only consist of documents that may be viewed by the currently logged in user. I can think of three strategies to accomplish this goal: 1) using Filter and FilteredQuery 2) filtering the search result 3) somehow storing the ACL elements as Lucene fields But each approach has serious drawbacks. The first one degrades rapidly as the number of documents increases. Think of determining the viewability of 10,000 documents where you need several SQL queries per document. The second approach also degrades badly when a user has access to a very small subset of all documents. There could be thousands of false hits before the first viewable document is reached. The third approach looks most promising to me but would require to update Lucene documents whenever an ACL changes. Unfortunately it is not possible to update Lucene documents without losing fields that are indexed but not stored, right? So my question is: is there another approach or a "standard solution" I did not think of? Or how did others solve this problem? Thanks in advance, Markus --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]