Antony Bowesman a écrit :
We're planning to archive email over many years and have been looking
at using DB to store mail meta data and Lucene for the indexed mail
data, or just Lucene on its own with email data and structure stored
as XML and the raw message stored in the file system.
For some customers, the volumes are likely to be well over 1 billion
mails over
10 years, so some partitioning of data is needed. At the moment the
thoughts
are moving away from using a DB + Lucene to just Lucene along with a
file system
representation of the complete message. All searches will be against
the index then the XML mail meta data is loaded from the file system.
The archive is read only apart from bulk deletes, but one of the
requirements is for users to be able to label their own mail. Given
that a Lucene Document cannot be updated, I have thought about having
a separate Lucene index that has just the 3 terms (or some combination
of) userId + mailId + label.
That of course would mean joining searches from the main mail data
index and the label index.
Does anyone have any experience of using Lucene this way and is it a
realistic option of avoiding the DB at all? I'd rather the headache
of scaling just Lucene, which is a simple beast, than the whole bundle
of 'stuff' that comes with the database as well.
Use Filter and BitSet.
From the personnal data, you build a Filter
(http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/Filter.html)
wich is used in the main index.
M.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]