Re: Searching user-private annotations associated with indexed documents

markharw00d Tue, 27 Nov 2007 19:23:50 -0800

lucene user wrote:

Am I being clear?


Now you are.

I don't know what you mean  by "PERSON_ANNOTATION works for Google".

I suppose I meant annotations in the sense GATE and UIMA refer toannotations. They are like a highlighter pen marking a particularsection of a document and adding metadata. It is is possible to"overlay" such metadata in the token stream so the sentence "Joe Smithworks for Google" would have a special "person" annotation tokenintroduced *at the same position* as "Joe Smith" and a "company"annotation token introduced at the same position as "Google". You thenhave a searchable document that contains both highly specific terms("Joe Smith", "Google") and their more generic forms("person","company"). You can then conduct queries to find out which people "workfor Google" and what companies "Joe Smith works for". Sounds like yourannotations are a different beast.

One way of weaving your "personalisation" functionality cleanly intoLucene would be as follows:1) Pick up the user's authentication from inside a ServletFilter(assuming you have a standard webapp) and set a thread-local "user"variable.2) Overide FilterIndexReader with a new "PersonalisedIndexReader" whichwraps the standard IndexReader. This class supplies an additional"annotations" field in the TermEnum/TermDocs which is personalisedaccording to the thread-local "user" variable.

This approach allows you to use all of the standard Lucene functionalityand yet have a personalised view of the index for each user of thesystem. The join between matches on the personalised "annotations" fieldand the standard Lucene text fields can be optimised because of thelow-level "TermDocs.skipTo" calls the query logic makes when executingBoolean logic. This level of optimisation could not be achieved if youdid this join outside of Lucene.The hard bit is managing the details behind storage/retrieval of theuser's annotations and managing references to doc Ids in this reader butoverall this seems like a clean way of packaging it all together.


Cheers,
Mark

On Nov 27, 2007 7:54 AM, mark harwood <[EMAIL PROTECTED]> wrote:

Do the annotations have positions ?

Do you want to do things like phrase-search e.g.
     "PERSON_ANNOTATION works for Google"

Or is your idea of an annotation more simply a del.ici.ous-style tag associated 
with the whole document?

Cheers
Mark


----- Original Message ----
From: lucene user <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, 27 November, 2007 12:31:38 PM
Subject: Re: Searching user-private annotations associated with indexed 
documents

I'd be VERY grateful for your help, folks! Thanks! I really need some
insight on this. THANKS!!

On Nov 26, 2007 6:43 PM, lucene user <[EMAIL PROTECTED]> wrote:

Here are the three options that seem practical to us right now.

(1) Do The annotation search in postgres using LIKE or the
   postgres native full text search. Take the resulting list
   of file ids and use it to build a filter for the lucene query,
   the way we currently do for folders.

(2) Add a second lucene index that contains only annotations.
   First retrieve a list of file ids satisfying the annotation
   query from this index and use it to create a filter for the
   main lucene query on the archive.
   Whenever annotation text is edited,
     if blank, delete annotation from index
     otherwise add or replace annotation in index.

(3) Add a second lucene index that contains contentrefs.
    This index would contain the same fields as the arhicve index
    plus the following:
      database_id: list of systemuser_id and contentref_id.
      annotation:  list of all annotation text for this
                   system user and content ref.
      folders:     list of all folder names for this systemuser and
                   content ref

    Whenever an article is added to or removed from a folder,
    or its annotation text is edited, the following would occur:
      See if it has an entry in the lucene index for the database.
      if so,
        extract the lucene document from the index.
        if the updated list of folders that contain it is empty,
           delete this document from the lucene database index.
        otherwise,
          update the folder and annotation in the document object.
          delete this document from the index.
          add the updated document object to the index.
      if not,
        extract the lucene document for the article from the archive

 index

        add the database_id, folders, and annotation fields to this

 object

        add the document object to the lucene database index.

Got a better idea on this?

Thanks!!


On Nov 26, 2007 5:33 PM, lucene user <[EMAIL PROTECTED]> wrote:

Folks

I have some additional textual data that is user specific,

 basically

annotations about documents. I would like to be able to do
**combined** searches, looking for some words in the document and

 some

in my users' private annotations about that document. Any

 suggestions

about how I should handle this? The annotations are changeable by
users at any time so we have to be ready to delete them and add

 others

at any time when the user does edit an annotation.

Do I need a second Lucene index? Can I do a query against two

 indexes

at the same time? If so, how?

The annotations will be very small but highly volatile. The

 database

of documents will grow large but nothing will be deleted from it.

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Searching user-private annotations associated with indexed documents

Reply via email to