I had similar requirements: some fields didn't required text processing,
there were just used as filters to focus the search on subset of
documents in solr. As Karl suggested, implementing a filter was the most
direct approach for me.
The issue was that, not been familiar myself with solr, I couldn't
manage to integrate my filter without modifying SolrIndexSearcher, the
change was basically to replace every invocation of
searcher.search(query, new HitCollector() { ... }) ;
with
searcher.search(query, myCustomFilter, new HitCollector() {
... }) ;
myCustomFilter is an instance of TermsFilter with document's keys added
based on a query from external database. Also minor changes were made
in SolrCore.java to be able to declare the filter in sorlconfig.xml.
The thing worked ok, but I always wondered if that was the best way to
integrate the filter.
regards,
Wojciech Strzałka wrote:
The most changing fields will be I think:
Status (read/unread): in fact I'm affraid of this at most - any
mail incoming to the system will need to be indexed at
least twice
Flags: 0..n values from enum
Tags: 0..n values from enum
Of course all the other fields can also change - even content in draft messages
(it's live content, not archival) - but in such a case I'm ready to go
with the re-indexing.
Hi Wojciech,
can you please give us a bit more specific information about the meta
data fields that will change? I would recommend you looking at
creating filters from your primary persistency for query clauses such
as unread/read, mailbox folders, et c.
karl
12 sep 2008 kl. 13.57 skrev Wojciech Strza?ka:
Hi.
I'm new to Lucene and I would like to get a few answers (they can
be lame)
I want to index large amount of emails using Lucene (maybe SOLR),
not only
the contents but also some metadata like state or flags. The
problem is that the metadata will change during mail lifecycle,
although much smaller updating this information will require
reindex the whole mail content which I see performance bottleneck.
I have the data in DB also so my first question is:
- are there any best practices to implement my needs (querying both
lucene & DB and then merging in memory?, close one eye and re-index
the whole content on every metadata change? others?)
- is at all Lucene good solution for my problem?
- are there any plans to implement field updates in more efficient
way then
delete/insert the whole document? if yes what's the time horizon?
Best regards
Wojtek
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]