Re: Scaling Lucene to 1bln docs

2010-08-16 Thread Danil ŢORIN
m [mailto:ansh...@gmail.com] > Sent: Wednesday, August 11, 2010 10:38 AM > To: java-user@lucene.apache.org > Subject: Re: Scaling Lucene to 1bln docs > > So, you didn't really use the setRamBuffer.. ? > Any reasons for that? > > -- > Anshum Gupta > http://ai-cafe.blogspot.c

RE: Scaling Lucene to 1bln docs

2010-08-16 Thread Shelly_Singh
nfosys.com Phone: (M) 91 992 369 7200, (VoIP)2022978622 -Original Message- From: Anshum [mailto:ansh...@gmail.com] Sent: Wednesday, August 11, 2010 10:38 AM To: java-user@lucene.apache.org Subject: Re: Scaling Lucene to 1bln docs So, you didn't really use the setRamBuffer.. ? Any r

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
: Scaling Lucene to 1bln docs So, you didn't really use the setRamBuffer.. ? Any reasons for that? -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Aug 11, 2010 at 10:28 AM, Shelly_Singh wrote: > My final settings are: > 1. 1.5 gig RAM to the jvm out of 2GB available for my

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread Anshum
- > From: Pablo Mendes [mailto:pablomen...@gmail.com] > Sent: Tuesday, August 10, 2010 7:22 PM > To: java-user@lucene.apache.org > Subject: Re: Scaling Lucene to 1bln docs > > Shelly, > Do you mind sharing with the list the final settings you used for your best > results?

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
compare with regular docs. -Original Message- From: Pablo Mendes [mailto:pablomen...@gmail.com] Sent: Tuesday, August 10, 2010 7:22 PM To: java-user@lucene.apache.org Subject: Re: Scaling Lucene to 1bln docs Shelly, Do you mind sharing with the list the final settings you used for your best

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread Pablo Mendes
t; > Regards, > Shelly > > -Original Message- > From: Danil ŢORIN [mailto:torin...@gmail.com] > Sent: Tuesday, August 10, 2010 6:52 PM > To: java-user@lucene.apache.org > Subject: Re: Scaling Lucene to 1bln docs > > That won't work...if you'll have some

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread anshum.gu...@naukri.com
2010 19:11:11 To: java-user@lucene.apache.org Reply-To: java-user@lucene.apache.org Subject: RE: Scaling Lucene to 1bln docs Hi folks, Thanks for the excellent support n guidance on my very first day on this mailing list... At end of day, I have very optimistic results. 100bln search in less tha

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
To: java-user@lucene.apache.org Subject: Re: Scaling Lucene to 1bln docs That won't work...if you'll have something like "A Basic Crazy Document E-something F-something G-somethingyou get the point" it will go to all shards so the whole point of shards will be compromi

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
anil ŢORIN [mailto:torin...@gmail.com] Sent: Tuesday, August 10, 2010 6:52 PM To: java-user@lucene.apache.org Subject: Re: Scaling Lucene to 1bln docs That won't work...if you'll have something like "A Basic Crazy Document E-something F-something G-somethingyou get the point" i

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread Danil ŢORIN
of another option. > > Comments welcome. > > > -----Original Message- > From: Danil ŢORIN [mailto:torin...@gmail.com] > Sent: Tuesday, August 10, 2010 6:11 PM > To: java-user@lucene.apache.org > Subject: Re: Scaling Lucene to 1bln docs > > I'd second t

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
an efficient merging algorithm. > > Regards, > Dan > > > > > -Original Message- > From: Shelly_Singh [mailto:shelly_si...@infosys.com] > Sent: Tuesday, August 10, 2010 8:20 AM > To: java-user@lucene.apache.org > Subject: RE: Scaling Lucene to 1bln docs > > No

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread prashant ullegaddi
efficient merging algorithm. > > -Original Message- > From: Dan OConnor [mailto:docon...@acquiremedia.com] > Sent: Tuesday, August 10, 2010 6:02 PM > To: java-user@lucene.apache.org > Subject: RE: Scaling Lucene to 1bln docs > > Shelly: > > You wouldn't

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread Danil ŢORIN
sage- > From: Shelly_Singh [mailto:shelly_si...@infosys.com] > Sent: Tuesday, August 10, 2010 8:20 AM > To: java-user@lucene.apache.org > Subject: RE: Scaling Lucene to 1bln docs > > No sort. I will need relevance based on TF. If I shard, I will have to search > in al indi

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
. -Original Message- From: Dan OConnor [mailto:docon...@acquiremedia.com] Sent: Tuesday, August 10, 2010 6:02 PM To: java-user@lucene.apache.org Subject: RE: Scaling Lucene to 1bln docs Shelly: You wouldn't necessarily have to use a multisearcher. A suggested alternative is: - shard into 10 in

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
...@gmail.com] Sent: Tuesday, August 10, 2010 5:59 PM To: java-user@lucene.apache.org Subject: Re: Scaling Lucene to 1bln docs Searching on all in dices shouldn't be that bad an idea instead of searching a single huge index, specially considering you have a constraint on the usable memory

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Dan OConnor
ly_si...@infosys.com] Sent: Tuesday, August 10, 2010 8:20 AM To: java-user@lucene.apache.org Subject: RE: Scaling Lucene to 1bln docs No sort. I will need relevance based on TF. If I shard, I will have to search in al indices. -Original Message- From: anshum.gu...@naukri.com [mailto:ansh...@gmai

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread Anshum
gt; -Original Message- > From: anshum.gu...@naukri.com [mailto:ansh...@gmail.com] > Sent: Tuesday, August 10, 2010 1:54 PM > To: java-user@lucene.apache.org > Subject: Re: Scaling Lucene to 1bln docs > > Would like to know, are you using a particular type of sort? Do you need to

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
No sort. I will need relevance based on TF. If I shard, I will have to search in al indices. -Original Message- From: anshum.gu...@naukri.com [mailto:ansh...@gmail.com] Sent: Tuesday, August 10, 2010 1:54 PM To: java-user@lucene.apache.org Subject: Re: Scaling Lucene to 1bln docs Would

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread findbestopensource
me but the search > time is highly unacceptable. > > Help again. > > -Original Message- > From: Anshum [mailto:ansh...@gmail.com] > Sent: Tuesday, August 10, 2010 12:55 PM > To: java-user@lucene.apache.org > Subject: Re: Scaling Lucene to 1bln docs > > Hi Shelly, > That se

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread Michael McCandless
o: java-user@lucene.apache.org > Reply-To: java-user@lucene.apache.org > Subject: RE: Scaling Lucene to 1bln docs > > Hi Anshum, > > I am already running with the 'setCompoundFile' option off. > And thanks for pointing out mergeFactor. I had tried a higher mergeFa

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread anshum.gu...@naukri.com
, 10 Aug 2010 13:31:38 To: java-user@lucene.apache.org Reply-To: java-user@lucene.apache.org Subject: RE: Scaling Lucene to 1bln docs Hi Anshum, I am already running with the 'setCompoundFile' option off. And thanks for pointing out mergeFactor. I had tried a higher mergeFactor coup

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
multisearcher for searching. Will that help? -Original Message- From: Danil ŢORIN [mailto:torin...@gmail.com] Sent: Tuesday, August 10, 2010 1:06 PM To: java-user@lucene.apache.org Subject: Re: Scaling Lucene to 1bln docs The problem actually won't be the indexing part. Searching such

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
java-user@lucene.apache.org Subject: Re: Scaling Lucene to 1bln docs Hi Shelly, That seems like a reasonable data set size. I'd suggest you increase your mergeFactor as a mergeFactor of 10 says, you are only buffering 10 docs in memory before writing it to a file (and incurring I/O). You could actual

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread Danil ŢORIN
The problem actually won't be the indexing part. Searching such large dataset will require a LOT of memory. If you'll need sorting or faceting on one of the fields, jvm will explode ;) Also GC times on large jvm heap are pretty disturbing (if you care about your search performance) So I'd advise

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread Anshum
Hi Shelly, That seems like a reasonable data set size. I'd suggest you increase your mergeFactor as a mergeFactor of 10 says, you are only buffering 10 docs in memory before writing it to a file (and incurring I/O). You could actually flush by RAM usage instead of a Doc count. Turn off using the Co

Scaling Lucene to 1bln docs

2010-08-09 Thread Shelly_Singh
Hi, I am developing an application which uses Lucene for indexing and searching 1 bln documents. (the document size is very small though. Each document has a single field of 5-10 words; so I believe that my data size is within the tested limits). I am using the following configuration: 1.