Hi Nilesh, I have a few queries regarding optimizing lucene for search performance.
1. We index around 50 million text documents with index size greater than 40GB, and hence runtime performance is curcial. Our system has only simple keyword queries. Each search returns an object of type Hits which contains all matching results. Since usually only top-10 results are displayed to the user, it does not make sense to search for all matching hundreds of thousand documents. Is there a way to specify the maximum number of search results that I want to be returned (and possibly have peroformance benefit)? OG: Not with Hits. You can use this, though: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/IndexSearcher.html#search(org.apache.lucene.search.Weight,%20org.apache.lucene.search.Filter,%20int) 2. We many a timees use the index to only get a count on number of matching documents. Only the size of answer set of the search query is required, and not the answer set itself. Is there a way to specify this while searching for some possible performance benefit. OG: Write your own HitCollector, say CountingHitCollector, and just count++ inside it. http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/IndexSearcher.html#search(org.apache.lucene.search.Weight,%20org.apache.lucene.search.Filter,%20org.apache.lucene.search.HitCollector) If you are dealing with just Terms, you could use: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/IndexSearcher.html#docFreq(org.apache.lucene.index.Term) or http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Searcher.html#docFreqs(org.apache.lucene.index.Term[]) 3. We have really simple conjunctive keyword queries. No phrases, no fuzzy matching, and no advanced search predicates. Is there a way that we can specify these restrictions on our querying language to have some possible performance benefits. OG: You can write your own QueryParser and disable types of queries that could be expensive if run, but that is it. Otis --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]