lucene user wrote:
Am I being clear?

Now you are.
I don't know what you mean  by "PERSON_ANNOTATION works for Google".
I suppose I meant annotations in the sense GATE and UIMA refer to annotations. They are like a highlighter pen marking a particular section of a document and adding metadata. It is is possible to "overlay" such metadata in the token stream so the sentence "Joe Smith works for Google" would have a special "person" annotation token introduced *at the same position* as "Joe Smith" and a "company" annotation token introduced at the same position as "Google". You then have a searchable document that contains both highly specific terms ("Joe Smith", "Google") and their more generic forms("person", "company"). You can then conduct queries to find out which people "work for Google" and what companies "Joe Smith works for". Sounds like your annotations are a different beast.

One way of weaving your "personalisation" functionality cleanly into Lucene would be as follows: 1) Pick up the user's authentication from inside a ServletFilter (assuming you have a standard webapp) and set a thread-local "user" variable. 2) Overide FilterIndexReader with a new "PersonalisedIndexReader" which wraps the standard IndexReader. This class supplies an additional "annotations" field in the TermEnum/TermDocs which is personalised according to the thread-local "user" variable.

This approach allows you to use all of the standard Lucene functionality and yet have a personalised view of the index for each user of the system. The join between matches on the personalised "annotations" field and the standard Lucene text fields can be optimised because of the low-level "TermDocs.skipTo" calls the query logic makes when executing Boolean logic. This level of optimisation could not be achieved if you did this join outside of Lucene. The hard bit is managing the details behind storage/retrieval of the user's annotations and managing references to doc Ids in this reader but overall this seems like a clean way of packaging it all together.

Cheers,
Mark









On Nov 27, 2007 7:54 AM, mark harwood <[EMAIL PROTECTED]> wrote:
Do the annotations have positions ?

Do you want to do things like phrase-search e.g.
     "PERSON_ANNOTATION works for Google"

Or is your idea of an annotation more simply a del.ici.ous-style tag associated 
with the whole document?

Cheers
Mark


----- Original Message ----
From: lucene user <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, 27 November, 2007 12:31:38 PM
Subject: Re: Searching user-private annotations associated with indexed 
documents

I'd be VERY grateful for your help, folks! Thanks! I really need some
insight on this. THANKS!!

On Nov 26, 2007 6:43 PM, lucene user <[EMAIL PROTECTED]> wrote:
Here are the three options that seem practical to us right now.

(1) Do The annotation search in postgres using LIKE or the
   postgres native full text search. Take the resulting list
   of file ids and use it to build a filter for the lucene query,
   the way we currently do for folders.

(2) Add a second lucene index that contains only annotations.
   First retrieve a list of file ids satisfying the annotation
   query from this index and use it to create a filter for the
   main lucene query on the archive.
   Whenever annotation text is edited,
     if blank, delete annotation from index
     otherwise add or replace annotation in index.

(3) Add a second lucene index that contains contentrefs.
    This index would contain the same fields as the arhicve index
    plus the following:
      database_id: list of systemuser_id and contentref_id.
      annotation:  list of all annotation text for this
                   system user and content ref.
      folders:     list of all folder names for this systemuser and
                   content ref

    Whenever an article is added to or removed from a folder,
    or its annotation text is edited, the following would occur:
      See if it has an entry in the lucene index for the database.
      if so,
        extract the lucene document from the index.
        if the updated list of folders that contain it is empty,
           delete this document from the lucene database index.
        otherwise,
          update the folder and annotation in the document object.
          delete this document from the index.
          add the updated document object to the index.
      if not,
        extract the lucene document for the article from the archive
 index
        add the database_id, folders, and annotation fields to this
 object
        add the document object to the lucene database index.

Got a better idea on this?

Thanks!!


On Nov 26, 2007 5:33 PM, lucene user <[EMAIL PROTECTED]> wrote:
Folks

I have some additional textual data that is user specific,
 basically
annotations about documents. I would like to be able to do
**combined** searches, looking for some words in the document and
 some
in my users' private annotations about that document. Any
 suggestions
about how I should handle this? The annotations are changeable by
users at any time so we have to be ready to delete them and add
 others
at any time when the user does edit an annotation.

Do I need a second Lucene index? Can I do a query against two
 indexes
at the same time? If so, how?

The annotations will be very small but highly volatile. The
 database
of documents will grow large but nothing will be deleted from it.

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to