Re: Design Problem: Searching large set of protected documents

Jonathan O'Connor Tue, 03 Apr 2007 08:28:04 -0700

Michael,
as usual its never so easy! Some users can see almost all documents, and
some other users can see very few.

I did find an interesting document that describes the problem (but offers
no solutions :-() http://www.ideaeng.com/pub/entsrch/v3n4/article01.html.

This article talks about early and late binding of security information.
Early binding is faster, but harder to implement. And of course, I
implemented the easier one.

I'm going to see what the computational and storage cost will be if I
precalculate this info.
Ciao,
Jonathan O'Connor
XCOM Dublin

             "Michael D.                                                   
             Curtin"                                                       
             <[EMAIL PROTECTED]>                                          To 
                                       java-user@lucene.apache.org         
             03/04/2007 15:28                                           cc 

                                                                   Subject 
             Please respond to         Re: Design Problem: Searching large 
             [EMAIL PROTECTED]         set of protected documents          
                apache.org                                                 

Jonathan O'Connor wrote:

> I have a database of a million documents and about 100 users. The
documents
> can have an access control list, and there is a complex, recursive
> algorithm to say if a particular user can see a particular document.
>
> My problem is that my search algorithm is to first do a standard lucene
> search for matching documents, and then check security on each one found,
> just returning the allowed documents. However, if I do this, and the
lucene
> returns 100000 docs, but the user can only see 10 of these, then
obviously
> the search is going to take an awful long time.
>
> Has anyone come across this problem before, and if so what approach did
you
> take? I guess I could precalculate the permissions for every
user-document
> pair, but that's alot of storage, and a lot of precalculation!

My knee-jerk reaction is to suggest a simpler document security model,
but I'm guessing that that option isn't available to you.

In your example the security attributes of a document are far more
discriminating than the query terms.  If that relationship is indicative
of most of your users and most of the documents, the users and documents
aren't updated much, and you have a lot of searching to do,
precalculation (results into an additional document field) seems the way
to go.  It might even turn out that, if you start from a presumption of
calculating every user--document security attribute, you come up with an
algorithm that is much more efficient than a one-off,
can-this-user-see-this-document type of algorithm.

Precalculation isn't necessarily a bad thing.  Often, it's quite
beneficial -- for example, the indexing process itself is a pretty
substantial precalculation step!

If this seems unwieldy or impractical for some reason, perhaps you could
post more attributes of your situation, such as user and data update and
addition frequency, query attributes and frequency, and so on.

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

*** XCOM AG Legal Disclaimer ***

Diese E-Mail einschliesslich ihrer Anhaenge ist vertraulich und ist allein
für den Gebrauch durch den vorgesehenen Empfaenger bestimmt. Dritten ist
das Lesen, Verteilen oder Weiterleiten dieser E-Mail untersagt. Wir bitten,
eine fehlgeleitete E-Mail unverzueglich vollstaendig zu loeschen und uns
eine Nachricht zukommen zu lassen.

This email may contain material that is confidential and for the sole use
of the intended recipient. Any review, distribution by others or forwarding
without express permission is strictly prohibited. If you are not the
intended recipient, please contact the sender and delete all copies.

Hauptsitz: Bahnstrasse 37, D-47877 Willich, USt-IdNr.: DE 812 885 664
Kommunikation: Telefon +49 2154 9209-70, Telefax +49 2154 9209-900,
www.xcom.de
Handelsregister: Amtsgericht Krefeld, HRB 10340
Vorstand: Matthias Albrecht, Renate Becker-Grope, Marco Marty, Dr. Rainer
Fuchs
Vorsitzender des Aufsichtsrates: Stephan Steuer

Re: Design Problem: Searching large set of protected documents

Reply via email to