Re: best practice: 1.4 billions documents

2010-11-29 Thread Ian Lea
use IndexSearcher with MultiReader? > > Regards > Ganesh > > - Original Message - > From: "Robert Muir" > To: > Sent: Saturday, November 27, 2010 1:28 AM > Subject: Re: best practice: 1.4 billions documents > > >> On Fri, Nov 26, 2010 at 12:4

Re: best practice: 1.4 billions documents

2010-11-29 Thread Ganesh
- Original Message - From: "Robert Muir" To: Sent: Saturday, November 27, 2010 1:28 AM Subject: Re: best practice: 1.4 billions documents > On Fri, Nov 26, 2010 at 12:49 PM, Uwe Schindler wrote: >> This is the problem for Fuzzy: each searcher expands the fuzzy quer

Re: best practice: 1.4 billions documents

2010-11-26 Thread Robert Muir
On Fri, Nov 26, 2010 at 12:49 PM, Uwe Schindler wrote: > This is the problem for Fuzzy: each searcher expands the fuzzy query to a > different Boolean Query and so the scores are not comparable - MultiSearcher > (but not Solr) tries to combine the resulting rewritten queries into one > query, so e

RE: best practice: 1.4 billions documents

2010-11-26 Thread Uwe Schindler
er@lucene.apache.org; Uwe Schindler > Subject: Re: best practice: 1.4 billions documents > > On Mon, Nov 22, 2010 at 12:49 PM, Uwe Schindler wrote: > > (Fuzzy scores on > > MultiSearcher and Solr are totally wrong because each shard uses > > another rewritten query). &

Re: best practice: 1.4 billions documents

2010-11-26 Thread Yonik Seeley
On Mon, Nov 22, 2010 at 12:49 PM, Uwe Schindler wrote: > (Fuzzy scores on > MultiSearcher and Solr are totally wrong because each shard uses another > rewritten query). Hmmm, really? I thought that fuzzy scoring should just rely on edit distance? Oh wait, I think I see - it's because we can use

RE: best practice: 1.4 billions documents

2010-11-25 Thread Uwe Schindler
e eMail: u...@thetaphi.de > -Original Message- > From: Ganesh [mailto:emailg...@yahoo.co.in] > Sent: Thursday, November 25, 2010 9:55 AM > To: java-user@lucene.apache.org > Subject: Re: best practice: 1.4 billions documents > > Thanks for the input. > > My results

Re: best practice: 1.4 billions documents

2010-11-25 Thread Ganesh
Thanks for the input. My results are sorted by date and i am not much bothered about score. Will i still be in trouble? Regards Ganesh - Original Message - From: "Robert Muir" To: Sent: Thursday, November 25, 2010 1:45 PM Subject: Re: best practice: 1.4 billions document

Re: best practice: 1.4 billions documents

2010-11-25 Thread Robert Muir
On Thu, Nov 25, 2010 at 2:58 AM, Uwe Schindler wrote: > ParallelMultiSearcher as subclass of MultiSearcher has the same problems. > These are not crashes, but more that some queries do not return correct > scored results for some queries. This effects especially all MultiTermQueries > (TermRang

RE: best practice: 1.4 billions documents

2010-11-24 Thread Uwe Schindler
ilto:emailg...@yahoo.co.in] > Sent: Thursday, November 25, 2010 6:44 AM > To: java-user@lucene.apache.org > Subject: Re: best practice: 1.4 billions documents > > Since there was a debate about using multisearcher, what about using > ParallelMultiSearcher? > > I am having indexe

Re: best practice: 1.4 billions documents

2010-11-24 Thread Ganesh
now i didn't faced any issue. I used Lucene 2.9 and recently upgraded to 3.0.2. Do i need to switch to MultiReader? Regards Ganesh - Original Message - From: "Luca Rondanini" To: Sent: Monday, November 22, 2010 11:29 PM Subject: Re: best practice: 1.4 billions docu

Re: best practice: 1.4 billions documents

2010-11-22 Thread Luca Rondanini
eheheheh, 1.4 billion of documents = 1,400,000,000 documents for almost 2T = 2 therabites = 2000 gigas on HD! On Mon, Nov 22, 2010 at 10:16 AM, wrote: > > of course I will distribute my index over many machines: > > store everything on > > one computer is just crazy, 1.4B docs is going to b

RE: best practice: 1.4 billions documents

2010-11-22 Thread spring
> of course I will distribute my index over many machines: > store everything on > one computer is just crazy, 1.4B docs is going to be an index > of almost 2T > (in my case) billion = giga in english billion = tera in non-english 2T docs = 2.000.000.000.000 docs... ;) AFAIK 2 ^ 32 - 1 docs is

Re: best practice: 1.4 billions documents

2010-11-22 Thread Luca Rondanini
earchers, indexing additional documents, or filling FieldCache in > parallel. > > > > Uwe > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > > > -Original Messa

RE: best practice: 1.4 billions documents

2010-11-22 Thread Uwe Schindler
il.com [mailto:ysee...@gmail.com] On Behalf Of Yonik > Seeley > Sent: Monday, November 22, 2010 6:29 PM > To: java-user@lucene.apache.org > Subject: Re: best practice: 1.4 billions documents > > On Mon, Nov 22, 2010 at 12:17 PM, Uwe Schindler wrote: > > The latest discussion

Re: best practice: 1.4 billions documents

2010-11-22 Thread Yonik Seeley
On Mon, Nov 22, 2010 at 12:17 PM, Uwe Schindler wrote: > The latest discussion was more about MultiReader vs. MultiSearcher. > > But you are right, 1.4 B documents is not easy to go, especially when you > index grows and you get to the 2.1 B marker, then no MultiSearcher or > whatever helps. > > O

RE: best practice: 1.4 billions documents

2010-11-22 Thread Uwe Schindler
er@lucene.apache.org > Subject: Re: best practice: 1.4 billions documents > > Am I the only one who thinks this is not the way to go, MultiReader (or > MulitiSearcher) is not going to fix your problems. Having 1.4B Documents on > one machine is a big number, does not matter how you

Re: best practice: 1.4 billions documents

2010-11-22 Thread eks dev
ling FieldCache in parallel. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: David Fertig [mailto:dfer...@cymfony.com] > > Sent: Monday,

RE: best practice: 1.4 billions documents

2010-11-22 Thread Uwe Schindler
-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: David Fertig [mailto:dfer...@cymfony.com] > Sent: Monday, November 22, 2010 5:57 PM > To: java-user@lucene.apache.org > Subject: RE: best practice: 1.4 billions documents >

RE: best practice: 1.4 billions documents

2010-11-22 Thread David Fertig
--- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Monday, November 22, 2010 11:19 AM To: java-user@lucene.apache.org Subject: RE: best practice: 1.4 billions documents There is no reason to use MultiSearcher instead the much more consistent and effective MultiReader! We (Robert and me) are

RE: best practice: 1.4 billions documents

2010-11-22 Thread Uwe Schindler
remen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: David Fertig [mailto:dfer...@cymfony.com] > Sent: Monday, November 22, 2010 4:54 PM > To: java-user@lucene.apache.org > Subject: RE: best practice: 1.4 billions documents > > >> We have a couple

RE: best practice: 1.4 billions documents

2010-11-22 Thread David Fertig
ni [mailto:luca.rondan...@gmail.com] Sent: Monday, November 22, 2010 1:47 AM To: java-user@lucene.apache.org Subject: Re: best practice: 1.4 billions documents Hi David, thanks for your answer. it really helped a lot! so, you have an index with more than 2 billions segments. this is pretty

Re: best practice: 1.4 billions documents

2010-11-22 Thread Erick Erickson
iginal Message- > > From: Luca Rondanini [mailto:luca.rondan...@gmail.com] > > Sent: Sunday, November 21, 2010 8:13 PM > > To: java-user@lucene.apache.org; yo...@lucidimagination.com > > Subject: Re: best practice: 1.4 billions documents > > > > thank you bot

Re: best practice: 1.4 billions documents

2010-11-21 Thread Luca Rondanini
M > To: java-user@lucene.apache.org; yo...@lucidimagination.com > Subject: Re: best practice: 1.4 billions documents > > thank you both! > > Johannes, katta seems interesting but I will need to solve the problems of > "hot" updates to the index > > Yonik, I see

RE: best practice: 1.4 billions documents

2010-11-21 Thread David Fertig
From: Luca Rondanini [mailto:luca.rondan...@gmail.com] Sent: Sunday, November 21, 2010 8:13 PM To: java-user@lucene.apache.org; yo...@lucidimagination.com Subject: Re: best practice: 1.4 billions documents thank you both! Johannes, katta seems interesting but I will need to solve the problems of &qu

Re: best practice: 1.4 billions documents

2010-11-21 Thread Luca Rondanini
thank you both! Johannes, katta seems interesting but I will need to solve the problems of "hot" updates to the index Yonik, I see your point - so your suggestion would be to build an architecture based on ParallelMultiSearcher? On Sun, Nov 21, 2010 at 3:48 PM, Yonik Seeley wrote: > On Sun, No

Re: best practice: 1.4 billions documents

2010-11-21 Thread Yonik Seeley
On Sun, Nov 21, 2010 at 6:33 PM, Luca Rondanini wrote: > Hi everybody, > > I really need some good advice! I need to index in lucene something like 1.4 > billions documents. I had experience in lucene but I've never worked with > such a big number of documents. Also this is just the number of docs

Re: best practice: 1.4 billions documents

2010-11-21 Thread Johannes Goll
Hi Luca, Katta is an open-source project that integrates Lucene with Hadoop http://katta.sourceforge.net Johannes 2010/11/21 Luca Rondanini > Hi everybody, > > I really need some good advice! I need to index in lucene something like > 1.4 > billions documents. I had experience in lucene but I'