Op Friday 11 April 2008 13:49:59 schreef Mathieu Lecarme: > Antony Bowesman a écrit : > > We're planning to archive email over many years and have been > > looking at using DB to store mail meta data and Lucene for the > > indexed mail data, or just Lucene on its own with email data and > > structure stored as XML and the raw message stored in the file > > system. > > > > For some customers, the volumes are likely to be well over 1 > > billion mails over > > 10 years, so some partitioning of data is needed. At the moment > > the thoughts > > are moving away from using a DB + Lucene to just Lucene along with > > a file system > > representation of the complete message. All searches will be > > against the index then the XML mail meta data is loaded from the > > file system. > > > > The archive is read only apart from bulk deletes, but one of the > > requirements is for users to be able to label their own mail. > > Given that a Lucene Document cannot be updated, I have thought > > about having a separate Lucene index that has just the 3 terms (or > > some combination of) userId + mailId + label. > > > > That of course would mean joining searches from the main mail data > > index and the label index. > > > > Does anyone have any experience of using Lucene this way and is it > > a realistic option of avoiding the DB at all? I'd rather the > > headache of scaling just Lucene, which is a simple beast, than the > > whole bundle of 'stuff' that comes with the database as well. > > Use Filter and BitSet. > From the personnal data, you build a Filter > (http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/Fil >ter.html) wich is used in the main index.
With 1 billion mails, and possibly a Filter per user, you may want to use more compact filters than BitSets, which is currently possible in the development trunk of lucene. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]