I had similar requirements: some fields didn't required text processing, there were just used as filters to focus the search on subset of documents in solr. As Karl suggested, implementing a filter was the most direct approach for me.

The issue was that, not been familiar myself with solr, I couldn't manage to integrate my filter without modifying SolrIndexSearcher, the change was basically to replace every invocation of

         searcher.search(query, new HitCollector() { ... }) ;
with
searcher.search(query, myCustomFilter, new HitCollector() { ... }) ;

myCustomFilter is an instance of TermsFilter with document's keys added based on a query from external database. Also minor changes were made in SolrCore.java to be able to declare the filter in sorlconfig.xml. The thing worked ok, but I always wondered if that was the best way to integrate the filter.

regards,

Wojciech Strzałka wrote:
The most changing fields will be I think:
  Status (read/unread):  in fact I'm affraid of this at most - any
                         mail incoming to the system will need to be indexed at 
least twice
  Flags:   0..n values from enum
  Tags:    0..n values from enum

Of course all the other fields can also change - even content in draft messages
(it's live content, not archival) - but in such a case I'm ready to go
with the re-indexing.
Hi Wojciech,
can you please give us a bit more specific information about the meta data fields that will change? I would recommend you looking at creating filters from your primary persistency for query clauses such
as unread/read, mailbox folders, et c.
karl 12 sep 2008 kl. 13.57 skrev Wojciech Strza?ka:
Hi.

  I'm new to Lucene and I would like to get a few answers (they can
  be lame)

I want to index large amount of emails using Lucene (maybe SOLR), not only
  the contents but also some metadata like state or flags. The
  problem is that the metadata will change during mail lifecycle,
  although much smaller updating this information will require
  reindex the whole mail content which I see performance bottleneck.

  I have the data in DB also so my first question is:

  - are there any best practices to implement my needs (querying both
  lucene & DB and then merging in memory?, close one eye and re-index
  the whole content on every metadata change? others?)

  - is at all Lucene good solution for my problem?

- are there any plans to implement field updates in more efficient way then
  delete/insert the whole document? if yes what's the time horizon?


                                       Best regards
                                              Wojtek


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to