RE: Bettering search performance

2010-08-27 Thread Shelly_Singh
-08-27 at 05:34 +0200, Shelly_Singh wrote: > I have a lucene index of 100 million documents. [...] total index size is > 7GB. [...] > I get a response time of over 2 seconds. How many documents match such a query and how many of those documents do you process (i.e. extract a

Bettering search performance

2010-08-26 Thread Shelly_Singh
Hi, I have a lucene index of 100 million documents. But the document size is very small - 5 fields with 1 or 2 terms each. Only 1 field is analyzed and others are just simply indexed. The index is optimized to 2 segments and the total index size is 7GB. I open a searcher with a termsInfoDiviso

RE: Sorting a Lucene index

2010-08-24 Thread Shelly_Singh
work has been put into making Lucene fast, by very bright people. See if they've already solved your problem for you... Best Erick. On Thu, Aug 19, 2010 at 1:51 AM, Shelly_Singh wrote: > Hi Anshum, > > I require sorted results for all my queries and the field on which I need > s

RE: Sorting a Lucene index

2010-08-18 Thread Shelly_Singh
time or is it a presumption? -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Aug 18, 2010 at 5:12 PM, Shelly_Singh wrote: > Hi, > > I have a Lucene index that contains a numeric field along with certain > other fields. The order of incoming documents is random and un-predictable. &

TermQuery and ConstantScoreQuery on TermsFilter

2010-08-18 Thread Shelly_Singh
Hi, In my index lucene index, I want to search on a field, but the score or order of returned documents is not important. What is important is which documents are returned. As, I do not need score or even default sorting(order by docid), what is the best way to write a query. I compared perf

Sorting a Lucene index

2010-08-18 Thread Shelly_Singh
Hi, I have a Lucene index that contains a numeric field along with certain other fields. The order of incoming documents is random and un-predictable. As a result, while creating an index, I end up adding docs in random order with respect to the numeric field value. For example, documents may

RE: Scaling Lucene to 1bln docs

2010-08-16 Thread Shelly_Singh
easons for that? -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Aug 11, 2010 at 10:28 AM, Shelly_Singh wrote: > My final settings are: > 1. 1.5 gig RAM to the jvm out of 2GB available for my desktop > 2. 100GB disk space. > 3. Index creation and searching tuning factor

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
: Scaling Lucene to 1bln docs So, you didn't really use the setRamBuffer.. ? Any reasons for that? -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Aug 11, 2010 at 10:28 AM, Shelly_Singh wrote: > My final settings are: > 1. 1.5 gig RAM to the jvm out of 2GB available for my

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
gt; http://blog.anshumgupta.net > > Sent from BlackBerry® > > -Original Message- > From: Shelly_Singh > Date: Tue, 10 Aug 2010 19:11:11 > To: java-user@lucene.apache.org > Reply-To: java-user@lucene.apache.org > Subject: RE: Scaling Lucene to 1bln docs > > Hi f

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
lution. Lucene is just a tool (a fine one) but you need to use it wisely to archive great results. On Tue, Aug 10, 2010 at 15:55, Shelly_Singh wrote: > Hmm..I get the point. But, in my application, the document is basically a > descriptive name of a particular thing. The user will search

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
arge datasets it's a lot of tuning, custom code, and no one-size-fits-all solution. Lucene is just a tool (a fine one) but you need to use it wisely to archive great results. On Tue, Aug 10, 2010 at 15:55, Shelly_Singh wrote: > Hmm..I get the point. But, in my application, the document is

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
ate, otherwise random assignment > is fine. > - have a pool of IndexSearchers for each index > - when a search comes in, allocate a Searcher from each index to the search. > - perform the search in parallel across all indices. > - merge the results in your own code using

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
allel across all indices. - merge the results in your own code using an efficient merging algorithm. Regards, Dan -Original Message- From: Shelly_Singh [mailto:shelly_si...@infosys.com] Sent: Tuesday, August 10, 2010 8:20 AM To: java-user@lucene.apache.org Subject: RE: Scaling Lucene to

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
ex on timeline, and as a query would be associated with a particular period you would only query the indexes containing data for that period. This would make the data manageable and searchable within reasonable time. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Aug 10, 2010 at 5:49 PM, Shelly_

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
like to know, are you using a particular type of sort? Do you need to sort on relevance? Can you shard and restrict your search to a limited set of indexes functionally? -- Anshum http://blog.anshumgupta.net Sent from BlackBerry(r) -Original Message- From: Shelly_Singh Date: Tue, 10

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
. > > -- > Anshum Gupta > http://ai-cafe.blogspot.com > > > On Tue, Aug 10, 2010 at 12:24 PM, Shelly_Singh > wrote: > >> Hi, >> >> I am developing an application which uses Lucene for indexing and searching >> 1 bln documents. (the document size is

RE: Scaling Lucene to 1bln docs

2010-08-10 Thread Shelly_Singh
intermittently. You may also use a multithreaded approach in case reading the source takes time in your case, though, the indexwriter would have to be shared among all threads. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Aug 10, 2010 at 12:24 PM, Shelly_Singh wrote: > Hi, > > I

Scaling Lucene to 1bln docs

2010-08-09 Thread Shelly_Singh
Hi, I am developing an application which uses Lucene for indexing and searching 1 bln documents. (the document size is very small though. Each document has a single field of 5-10 words; so I believe that my data size is within the tested limits). I am using the following configuration: 1.