Hi!

I've got exactly the same problem. Maybe it is possible to extend the 
previously discussed patch to fragment the fields of one document into separate 
files to actually allow updating only one fragment? Then, updating frequently 
changing fields (like ACLs or other meta data, maybe even a PageRank value for 
Nutch?), would be cheaper. This would also allow to easily 'render' ACLs on the 
documents they influence while changing the ACLs. After all, you don't change 
ACLs as often as you access documents. I guess this would be hard, as the 
lexicon is stored elsewhere... Any ideas?
It would of course be even better to properly separate these in different 
indices and be able to map document id's across them. Updating would be rather 
simple, and retrieval may be done in parallel. Maybe a custom 
RelationalMultiSearcher would be in order?

I've also thought about combining document and field based fragmentation 
strategies. Since we need subsecond search and update performance of a 
multi-million document index in the near future, this seems the way to go. 
Hardware would not really be an issue here, but of course we want to be 
efficient, especially in a multi-processor environment. Have there been any 
thoughts about this?

Best regards,

Max Pfingsthorn

Hippo  

Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
-------------------------------------------------------------
[EMAIL PROTECTED] / www.hippo.nl
--------------------------------------------------------------



-----Original Message-----
From: Markus Wiederkehr [mailto:[EMAIL PROTECTED]
Sent: Monday, May 30, 2005 09:47
To: Lucene users
Subject: ACLs and Lucene


I am working on a Document Management System where every document has
an Access Control List attached to it. Obviously a search result
should only consist of documents that may be viewed by the currently
logged in user.

I can think of three strategies to accomplish this goal:

1) using Filter and FilteredQuery
2) filtering the search result
3) somehow storing the ACL elements as Lucene fields

But each approach has serious drawbacks.

The first one degrades rapidly as the number of documents increases.
Think of determining the viewability of 10,000 documents where you
need several SQL queries per document.

The second approach also degrades badly when a user has access to a
very small subset of all documents. There could be thousands of false
hits before the first viewable document is reached.

The third approach looks most promising to me but would require to
update Lucene documents whenever an ACL changes. Unfortunately it is
not possible to update Lucene documents without losing fields that are
indexed but not stored, right?

So my question is: is there another approach or a "standard solution"
I did not think of? Or how did others solve this problem?

Thanks in advance,

Markus

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to