use IndexSearcher with MultiReader?
>
> Regards
> Ganesh
>
> - Original Message -
> From: "Robert Muir"
> To:
> Sent: Saturday, November 27, 2010 1:28 AM
> Subject: Re: best practice: 1.4 billions documents
>
>
>> On Fri, Nov 26, 2010 at 12:4
- Original Message -
From: "Robert Muir"
To:
Sent: Saturday, November 27, 2010 1:28 AM
Subject: Re: best practice: 1.4 billions documents
> On Fri, Nov 26, 2010 at 12:49 PM, Uwe Schindler wrote:
>> This is the problem for Fuzzy: each searcher expands the fuzzy quer
On Fri, Nov 26, 2010 at 12:49 PM, Uwe Schindler wrote:
> This is the problem for Fuzzy: each searcher expands the fuzzy query to a
> different Boolean Query and so the scores are not comparable - MultiSearcher
> (but not Solr) tries to combine the resulting rewritten queries into one
> query, so e
er@lucene.apache.org; Uwe Schindler
> Subject: Re: best practice: 1.4 billions documents
>
> On Mon, Nov 22, 2010 at 12:49 PM, Uwe Schindler wrote:
> > (Fuzzy scores on
> > MultiSearcher and Solr are totally wrong because each shard uses
> > another rewritten query).
&
On Mon, Nov 22, 2010 at 12:49 PM, Uwe Schindler wrote:
> (Fuzzy scores on
> MultiSearcher and Solr are totally wrong because each shard uses another
> rewritten query).
Hmmm, really? I thought that fuzzy scoring should just rely on edit distance?
Oh wait, I think I see - it's because we can use
e
eMail: u...@thetaphi.de
> -Original Message-
> From: Ganesh [mailto:emailg...@yahoo.co.in]
> Sent: Thursday, November 25, 2010 9:55 AM
> To: java-user@lucene.apache.org
> Subject: Re: best practice: 1.4 billions documents
>
> Thanks for the input.
>
> My results
Thanks for the input.
My results are sorted by date and i am not much bothered about score. Will i
still be in trouble?
Regards
Ganesh
- Original Message -
From: "Robert Muir"
To:
Sent: Thursday, November 25, 2010 1:45 PM
Subject: Re: best practice: 1.4 billions document
On Thu, Nov 25, 2010 at 2:58 AM, Uwe Schindler wrote:
> ParallelMultiSearcher as subclass of MultiSearcher has the same problems.
> These are not crashes, but more that some queries do not return correct
> scored results for some queries. This effects especially all MultiTermQueries
> (TermRang
ilto:emailg...@yahoo.co.in]
> Sent: Thursday, November 25, 2010 6:44 AM
> To: java-user@lucene.apache.org
> Subject: Re: best practice: 1.4 billions documents
>
> Since there was a debate about using multisearcher, what about using
> ParallelMultiSearcher?
>
> I am having indexe
now i didn't faced any issue. I
used Lucene 2.9 and recently upgraded to 3.0.2.
Do i need to switch to MultiReader?
Regards
Ganesh
- Original Message -
From: "Luca Rondanini"
To:
Sent: Monday, November 22, 2010 11:29 PM
Subject: Re: best practice: 1.4 billions docu
eheheheh,
1.4 billion of documents = 1,400,000,000 documents for almost 2T = 2
therabites = 2000 gigas on HD!
On Mon, Nov 22, 2010 at 10:16 AM, wrote:
> > of course I will distribute my index over many machines:
> > store everything on
> > one computer is just crazy, 1.4B docs is going to b
> of course I will distribute my index over many machines:
> store everything on
> one computer is just crazy, 1.4B docs is going to be an index
> of almost 2T
> (in my case)
billion = giga in english
billion = tera in non-english
2T docs = 2.000.000.000.000 docs... ;)
AFAIK 2 ^ 32 - 1 docs is
earchers, indexing additional documents, or filling FieldCache in
> parallel.
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> > > -Original Messa
il.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Monday, November 22, 2010 6:29 PM
> To: java-user@lucene.apache.org
> Subject: Re: best practice: 1.4 billions documents
>
> On Mon, Nov 22, 2010 at 12:17 PM, Uwe Schindler wrote:
> > The latest discussion
On Mon, Nov 22, 2010 at 12:17 PM, Uwe Schindler wrote:
> The latest discussion was more about MultiReader vs. MultiSearcher.
>
> But you are right, 1.4 B documents is not easy to go, especially when you
> index grows and you get to the 2.1 B marker, then no MultiSearcher or
> whatever helps.
>
> O
er@lucene.apache.org
> Subject: Re: best practice: 1.4 billions documents
>
> Am I the only one who thinks this is not the way to go, MultiReader (or
> MulitiSearcher) is not going to fix your problems. Having 1.4B Documents
on
> one machine is a big number, does not matter how you
ling FieldCache in parallel.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: David Fertig [mailto:dfer...@cymfony.com]
> > Sent: Monday,
-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: David Fertig [mailto:dfer...@cymfony.com]
> Sent: Monday, November 22, 2010 5:57 PM
> To: java-user@lucene.apache.org
> Subject: RE: best practice: 1.4 billions documents
>
---
From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Monday, November 22, 2010 11:19 AM
To: java-user@lucene.apache.org
Subject: RE: best practice: 1.4 billions documents
There is no reason to use MultiSearcher instead the much more consistent and
effective MultiReader! We (Robert and me) are
remen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: David Fertig [mailto:dfer...@cymfony.com]
> Sent: Monday, November 22, 2010 4:54 PM
> To: java-user@lucene.apache.org
> Subject: RE: best practice: 1.4 billions documents
>
> >> We have a couple
ni [mailto:luca.rondan...@gmail.com]
Sent: Monday, November 22, 2010 1:47 AM
To: java-user@lucene.apache.org
Subject: Re: best practice: 1.4 billions documents
Hi David, thanks for your answer. it really helped a lot! so, you have an
index with more than 2 billions segments. this is pretty
iginal Message-
> > From: Luca Rondanini [mailto:luca.rondan...@gmail.com]
> > Sent: Sunday, November 21, 2010 8:13 PM
> > To: java-user@lucene.apache.org; yo...@lucidimagination.com
> > Subject: Re: best practice: 1.4 billions documents
> >
> > thank you bot
M
> To: java-user@lucene.apache.org; yo...@lucidimagination.com
> Subject: Re: best practice: 1.4 billions documents
>
> thank you both!
>
> Johannes, katta seems interesting but I will need to solve the problems of
> "hot" updates to the index
>
> Yonik, I see
From: Luca Rondanini [mailto:luca.rondan...@gmail.com]
Sent: Sunday, November 21, 2010 8:13 PM
To: java-user@lucene.apache.org; yo...@lucidimagination.com
Subject: Re: best practice: 1.4 billions documents
thank you both!
Johannes, katta seems interesting but I will need to solve the problems of
&qu
thank you both!
Johannes, katta seems interesting but I will need to solve the problems of
"hot" updates to the index
Yonik, I see your point - so your suggestion would be to build an
architecture based on ParallelMultiSearcher?
On Sun, Nov 21, 2010 at 3:48 PM, Yonik Seeley wrote:
> On Sun, No
On Sun, Nov 21, 2010 at 6:33 PM, Luca Rondanini
wrote:
> Hi everybody,
>
> I really need some good advice! I need to index in lucene something like 1.4
> billions documents. I had experience in lucene but I've never worked with
> such a big number of documents. Also this is just the number of docs
Hi Luca,
Katta is an open-source project that integrates Lucene with Hadoop
http://katta.sourceforge.net
Johannes
2010/11/21 Luca Rondanini
> Hi everybody,
>
> I really need some good advice! I need to index in lucene something like
> 1.4
> billions documents. I had experience in lucene but I'
27 matches
Mail list logo