Re: Frequently updated fields

Gerardo Segura Sun, 14 Sep 2008 01:02:39 -0700

I had similar requirements: some fields didn't required text processing,there were just used as filters to focus the search on subset ofdocuments in solr. As Karl suggested, implementing a filter was the mostdirect approach for me.

The issue was that, not been familiar myself with solr, I couldn'tmanage to integrate my filter without modifying SolrIndexSearcher, thechange was basically to replace every invocation of


         searcher.search(query, new HitCollector() { ... }) ;
with

searcher.search(query, myCustomFilter, new HitCollector() {... }) ;

myCustomFilter is an instance of TermsFilter with document's keys addedbased on a query from external database. Also minor changes were madein SolrCore.java to be able to declare the filter in sorlconfig.xml.The thing worked ok, but I always wondered if that was the best way tointegrate the filter.


regards,

Wojciech Strzałka wrote:

The most changing fields will be I think:
  Status (read/unread):  in fact I'm affraid of this at most - any
                         mail incoming to the system will need to be indexed at 
least twice
  Flags:   0..n values from enum
  Tags:    0..n values from enum

Of course all the other fields can also change - even content in draft messages
(it's live content, not archival) - but in such a case I'm ready to go
with the re-indexing.

Hi Wojciech,

can you please give us a bit more specific information about the metadata fields that will change? I would recommend you looking atcreating filters from your primary persistency for query clauses such

as unread/read, mailbox folders, et c.

karl12 sep 2008 kl. 13.57 skrev Wojciech Strza?ka:

Hi.

  I'm new to Lucene and I would like to get a few answers (they can
  be lame)

I want to index large amount of emails using Lucene (maybe SOLR),not only

  the contents but also some metadata like state or flags. The
  problem is that the metadata will change during mail lifecycle,
  although much smaller updating this information will require
  reindex the whole mail content which I see performance bottleneck.

  I have the data in DB also so my first question is:

  - are there any best practices to implement my needs (querying both
  lucene & DB and then merging in memory?, close one eye and re-index
  the whole content on every metadata change? others?)

  - is at all Lucene good solution for my problem?

- are there any plans to implement field updates in more efficientway then

  delete/insert the whole document? if yes what's the time horizon?


                                       Best regards
                                              Wojtek


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Frequently updated fields

Reply via email to