Re: best practice: 1.4 billions documents

2010-11-21 Thread Luca Rondanini
Hi David, thanks for your answer. it really helped a lot! so, you have an index with more than 2 billions segments. this is pretty much the answer I was searching for: lucene alone is able to manage such a big index. which kind of problems do you have with the parallel searchers? I'm going to buil

RE: best practice: 1.4 billions documents

2010-11-21 Thread David Fertig
Actually I've been bitten by an still-unresolved issue with the parallel searchers and recommend a MultiReader instead. We have a couple billion docs in our archives as well. Breaking them up by day worked well for us, but you'll need to do something. -Original Message- From: Luca Ronda

Re: best practice: 1.4 billions documents

2010-11-21 Thread Luca Rondanini
thank you both! Johannes, katta seems interesting but I will need to solve the problems of "hot" updates to the index Yonik, I see your point - so your suggestion would be to build an architecture based on ParallelMultiSearcher? On Sun, Nov 21, 2010 at 3:48 PM, Yonik Seeley wrote: > On Sun, No

Re: best practice: 1.4 billions documents

2010-11-21 Thread Yonik Seeley
On Sun, Nov 21, 2010 at 6:33 PM, Luca Rondanini wrote: > Hi everybody, > > I really need some good advice! I need to index in lucene something like 1.4 > billions documents. I had experience in lucene but I've never worked with > such a big number of documents. Also this is just the number of docs

Re: best practice: 1.4 billions documents

2010-11-21 Thread Johannes Goll
Hi Luca, Katta is an open-source project that integrates Lucene with Hadoop http://katta.sourceforge.net Johannes 2010/11/21 Luca Rondanini > Hi everybody, > > I really need some good advice! I need to index in lucene something like > 1.4 > billions documents. I had experience in lucene but I'

best practice: 1.4 billions documents

2010-11-21 Thread Luca Rondanini
Hi everybody, I really need some good advice! I need to index in lucene something like 1.4 billions documents. I had experience in lucene but I've never worked with such a big number of documents. Also this is just the number of docs at "start-up": they are going to grow and fast. I don't have to