m [mailto:ansh...@gmail.com]
> Sent: Wednesday, August 11, 2010 10:38 AM
> To: java-user@lucene.apache.org
> Subject: Re: Scaling Lucene to 1bln docs
>
> So, you didn't really use the setRamBuffer.. ?
> Any reasons for that?
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.c
nfosys.com
Phone: (M) 91 992 369 7200, (VoIP)2022978622
-Original Message-
From: Anshum [mailto:ansh...@gmail.com]
Sent: Wednesday, August 11, 2010 10:38 AM
To: java-user@lucene.apache.org
Subject: Re: Scaling Lucene to 1bln docs
So, you didn't really use the setRamBuffer.. ?
Any r
: Scaling Lucene to 1bln docs
So, you didn't really use the setRamBuffer.. ?
Any reasons for that?
--
Anshum Gupta
http://ai-cafe.blogspot.com
On Wed, Aug 11, 2010 at 10:28 AM, Shelly_Singh wrote:
> My final settings are:
> 1. 1.5 gig RAM to the jvm out of 2GB available for my
-
> From: Pablo Mendes [mailto:pablomen...@gmail.com]
> Sent: Tuesday, August 10, 2010 7:22 PM
> To: java-user@lucene.apache.org
> Subject: Re: Scaling Lucene to 1bln docs
>
> Shelly,
> Do you mind sharing with the list the final settings you used for your best
> results?
compare with regular docs.
-Original Message-
From: Pablo Mendes [mailto:pablomen...@gmail.com]
Sent: Tuesday, August 10, 2010 7:22 PM
To: java-user@lucene.apache.org
Subject: Re: Scaling Lucene to 1bln docs
Shelly,
Do you mind sharing with the list the final settings you used for your best
t;
> Regards,
> Shelly
>
> -Original Message-
> From: Danil ŢORIN [mailto:torin...@gmail.com]
> Sent: Tuesday, August 10, 2010 6:52 PM
> To: java-user@lucene.apache.org
> Subject: Re: Scaling Lucene to 1bln docs
>
> That won't work...if you'll have some
2010 19:11:11
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: RE: Scaling Lucene to 1bln docs
Hi folks,
Thanks for the excellent support n guidance on my very first day on this
mailing list...
At end of day, I have very optimistic results. 100bln search in less tha
To: java-user@lucene.apache.org
Subject: Re: Scaling Lucene to 1bln docs
That won't work...if you'll have something like "A Basic Crazy
Document E-something F-something G-somethingyou get the point" it
will go to all shards so the whole point of shards will be
compromi
anil ŢORIN [mailto:torin...@gmail.com]
Sent: Tuesday, August 10, 2010 6:52 PM
To: java-user@lucene.apache.org
Subject: Re: Scaling Lucene to 1bln docs
That won't work...if you'll have something like "A Basic Crazy
Document E-something F-something G-somethingyou get the point" i
of another option.
>
> Comments welcome.
>
>
> -----Original Message-
> From: Danil ŢORIN [mailto:torin...@gmail.com]
> Sent: Tuesday, August 10, 2010 6:11 PM
> To: java-user@lucene.apache.org
> Subject: Re: Scaling Lucene to 1bln docs
>
> I'd second t
an efficient merging algorithm.
>
> Regards,
> Dan
>
>
>
>
> -Original Message-
> From: Shelly_Singh [mailto:shelly_si...@infosys.com]
> Sent: Tuesday, August 10, 2010 8:20 AM
> To: java-user@lucene.apache.org
> Subject: RE: Scaling Lucene to 1bln docs
>
> No
efficient merging algorithm.
>
> -Original Message-
> From: Dan OConnor [mailto:docon...@acquiremedia.com]
> Sent: Tuesday, August 10, 2010 6:02 PM
> To: java-user@lucene.apache.org
> Subject: RE: Scaling Lucene to 1bln docs
>
> Shelly:
>
> You wouldn't
sage-
> From: Shelly_Singh [mailto:shelly_si...@infosys.com]
> Sent: Tuesday, August 10, 2010 8:20 AM
> To: java-user@lucene.apache.org
> Subject: RE: Scaling Lucene to 1bln docs
>
> No sort. I will need relevance based on TF. If I shard, I will have to search
> in al indi
.
-Original Message-
From: Dan OConnor [mailto:docon...@acquiremedia.com]
Sent: Tuesday, August 10, 2010 6:02 PM
To: java-user@lucene.apache.org
Subject: RE: Scaling Lucene to 1bln docs
Shelly:
You wouldn't necessarily have to use a multisearcher. A suggested alternative
is:
- shard into 10 in
...@gmail.com]
Sent: Tuesday, August 10, 2010 5:59 PM
To: java-user@lucene.apache.org
Subject: Re: Scaling Lucene to 1bln docs
Searching on all in dices shouldn't be that bad an idea instead of searching
a single huge index, specially considering you have a constraint on the
usable memory
ly_si...@infosys.com]
Sent: Tuesday, August 10, 2010 8:20 AM
To: java-user@lucene.apache.org
Subject: RE: Scaling Lucene to 1bln docs
No sort. I will need relevance based on TF. If I shard, I will have to search
in al indices.
-Original Message-
From: anshum.gu...@naukri.com [mailto:ansh...@gmai
gt; -Original Message-
> From: anshum.gu...@naukri.com [mailto:ansh...@gmail.com]
> Sent: Tuesday, August 10, 2010 1:54 PM
> To: java-user@lucene.apache.org
> Subject: Re: Scaling Lucene to 1bln docs
>
> Would like to know, are you using a particular type of sort? Do you need to
No sort. I will need relevance based on TF. If I shard, I will have to search
in al indices.
-Original Message-
From: anshum.gu...@naukri.com [mailto:ansh...@gmail.com]
Sent: Tuesday, August 10, 2010 1:54 PM
To: java-user@lucene.apache.org
Subject: Re: Scaling Lucene to 1bln docs
Would
me but the search
> time is highly unacceptable.
>
> Help again.
>
> -Original Message-
> From: Anshum [mailto:ansh...@gmail.com]
> Sent: Tuesday, August 10, 2010 12:55 PM
> To: java-user@lucene.apache.org
> Subject: Re: Scaling Lucene to 1bln docs
>
> Hi Shelly,
> That se
o: java-user@lucene.apache.org
> Reply-To: java-user@lucene.apache.org
> Subject: RE: Scaling Lucene to 1bln docs
>
> Hi Anshum,
>
> I am already running with the 'setCompoundFile' option off.
> And thanks for pointing out mergeFactor. I had tried a higher mergeFa
, 10 Aug 2010 13:31:38
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: RE: Scaling Lucene to 1bln docs
Hi Anshum,
I am already running with the 'setCompoundFile' option off.
And thanks for pointing out mergeFactor. I had tried a higher mergeFactor
coup
multisearcher for
searching. Will that help?
-Original Message-
From: Danil ŢORIN [mailto:torin...@gmail.com]
Sent: Tuesday, August 10, 2010 1:06 PM
To: java-user@lucene.apache.org
Subject: Re: Scaling Lucene to 1bln docs
The problem actually won't be the indexing part.
Searching such
java-user@lucene.apache.org
Subject: Re: Scaling Lucene to 1bln docs
Hi Shelly,
That seems like a reasonable data set size. I'd suggest you increase your
mergeFactor as a mergeFactor of 10 says, you are only buffering 10 docs in
memory before writing it to a file (and incurring I/O). You could actual
The problem actually won't be the indexing part.
Searching such large dataset will require a LOT of memory.
If you'll need sorting or faceting on one of the fields, jvm will explode ;)
Also GC times on large jvm heap are pretty disturbing (if you care
about your search performance)
So I'd advise
Hi Shelly,
That seems like a reasonable data set size. I'd suggest you increase your
mergeFactor as a mergeFactor of 10 says, you are only buffering 10 docs in
memory before writing it to a file (and incurring I/O). You could actually
flush by RAM usage instead of a Doc count. Turn off using the Co
Hi,
I am developing an application which uses Lucene for indexing and searching 1
bln documents. (the document size is very small though. Each document has a
single field of 5-10 words; so I believe that my data size is within the tested
limits).
I am using the following configuration:
1.
26 matches
Mail list logo