Hi, I have not worked on a petascale (yet!) - mostly on the scale of tens of terabyes - but I do think Lucene would be very helpful for such usecase. I would indeed suggest partitioning the index by users (seems the most logical., straightforward way, also offers the security of insulating one user's emails from others.
Take a look at Compass and Solr (based on Lucene) and they might be more oriented to your needs. HTH, Shashi On Mon, Nov 23, 2009 at 9:35 PM, fulin tang <tangfu...@gmail.com> wrote: > We are going to add full-text search for our mailbox service . > > The problem is we have more than 1 PB mails there , and obviously we > don't want to add another PB storage for search service , so we hope > the index data will be small enough for storage while the search keeps > fast . > > The lucky is that every user just search with mails of their own , so > we can split the data into a lot of indexes instead of keeping them in > a big one . > > So, after all these concerns , the question is , is lucene a good > choice for this ? or which is the right way to do this ? Does anyone > have done this before ? > > All opinions and comments are welcome ! > > fulin > > > -- > 梦的开始挣扎于城市的边缘 > 心的远方执着在脚步的瞬间 > 我的宿命埋藏了寂寞的永远 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >