Lucene Indexing
Hi all, Can you tell me the exact indexing algorithm used by Lucene. or give some links to the documents that describe the algorithm used by lucene Thanks in advance -- Sairaj Sunil
Re: Lucene Indexing
Hi I was asking what exactly is the inverted indexing strategy used for storing the index. Is it batch-based index/b-tree based/segment-based data structure that is used as an index data structure. On 1/25/07, Rajiv Roopan <[EMAIL PROTECTED]> wrote: http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html On 1/24/07, Sairaj Sunil <[EMAIL PROTECTED]> wrote: > > Hi all, > Can you tell me the exact indexing algorithm used by Lucene. or give some > links to the documents that describe the algorithm used by lucene > Thanks in advance > -- > Sairaj Sunil > > -- Sairaj Sunil
Re: Lucene Indexing
I went through that document. It mentions about the Lucene's Indexing algorithm that it uses incremental algorithm. So, can i say that it uses a combination of segment-based and b-tree based strategies. If i am wrong please correct me. On 1/26/07, Damien McCarthy <[EMAIL PROTECTED]> wrote: This document should contain the information you need : http://lucene.sourceforge.net/talks/inktomi/ Damien. -Original Message- From: Sairaj Sunil [mailto:[EMAIL PROTECTED] Sent: 26 January 2007 03:22 To: java-user@lucene.apache.org Subject: Re: Lucene Indexing Hi I was asking what exactly is the inverted indexing strategy used for storing the index. Is it batch-based index/b-tree based/segment-based data structure that is used as an index data structure. On 1/25/07, Rajiv Roopan <[EMAIL PROTECTED]> wrote: > > > http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.h tml > > > On 1/24/07, Sairaj Sunil <[EMAIL PROTECTED]> wrote: > > > > Hi all, > > Can you tell me the exact indexing algorithm used by Lucene. or give > some > > links to the documents that describe the algorithm used by lucene > > Thanks in advance > > -- > > Sairaj Sunil > > > > > > -- Sairaj Sunil ----- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Sairaj Sunil II Mtech(CS) SSSIHL Prashanthi Nilayam
Simple QueryParser question
Hi, This is a newbie-level question. I want to construct a query, which returns the results sorted as follows: 1. Results having "all the terms" of the query string in title should be listed first 2. Results having "any of the terms" of the query string in the title should be listed next. 3. Results having "all the terms" of the query string in summary should be listed next 4. Results having "any of the terms" of the query string in the summary should come last. How do I construct the query using QueryParser. The fields to be searched are "title", "summary". Please help me -- Sairaj Sunil
Merge factor problem,
Hi all, I have increased the merge factor from 10 to 50. I thought the indexing performance will be better. But the time taken taken to index is more than the time taken for the merge factor of 10. The documentation and some articles say that the time taken to index will improve if the merge factor is increased. I have changed the merge factors to 50, 100, 1000. I have left the minMergeDocs to be the default value for all the cases. The time taken to index same number of documents increased in a linear fashion, which is exactly opposite according to the info I have read. Is this the correct behavior. In which cases this behavior happens? Regards -- Sairaj Sunil
Re: Merge factor problem,
Hi, I saw that article and it tells me that increasing the mergeFactor speeds up the indexing. But the reverse had happened in my case. To be more specific I had conducted some experiments for 1000 documents. The time taken is quite large, due to pdf file indexing. I had changed the indexwriter's parameters. MergeFactor – default(10) minMergeDocs – default(10) Time taken – 690 sec MergeFactor – 50 minMergeDocs – default(10) Time taken – 765 sec MergeFactor – default(10) minMergeDocs – 100 Time taken – 670 sec MergeFactor –100 minMergeDocs – 100 Time taken – 738 sec Increasing the mergeFactor did not speed up, but increasing the minMergeDocs had improved. I am using Lucene.Net. Can you explain the behavior. I am confused On 2/10/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Sairaj, see http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html Increase your maxBufferedDocs. Otis - Original Message From: Sairaj Sunil <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, February 9, 2007 11:14:50 AM Subject: Merge factor problem, Hi all, I have increased the merge factor from 10 to 50. I thought the indexing performance will be better. But the time taken taken to index is more than the time taken for the merge factor of 10. The documentation and some articles say that the time taken to index will improve if the merge factor is increased. I have changed the merge factors to 50, 100, 1000. I have left the minMergeDocs to be the default value for all the cases. The time taken to index same number of documents increased in a linear fashion, which is exactly opposite according to the info I have read. Is this the correct behavior. In which cases this behavior happens? Regards -- Sairaj Sunil - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Sairaj Sunil II Mtech(CS) SSSIHL Prashanthi Nilayam
Re: Merge factor problem,
Hi, just to give more info, I am using Lucene.Net 1.3 version, and not 1.9version. I think there is no option of setmaxBufferedDocs() in the old version. Can you tell me the best way to speed up the performance. What are the parameters that I should set. I know that this depends on the system, but which parameter exactly speeds up the indexing performance. Thank you -- Sairaj Sunil II Mtech(CS) SSSIHL Prashanthi Nilayam