Lucene Indexing

2007-01-24 Thread Sairaj Sunil

Hi all,
Can you tell me the exact indexing algorithm used by Lucene. or give some
links to the documents that describe the algorithm used by lucene
Thanks in advance
--
Sairaj Sunil


Re: Lucene Indexing

2007-01-25 Thread Sairaj Sunil

Hi
I was asking what exactly is the inverted indexing strategy used for storing
the index. Is it batch-based index/b-tree based/segment-based data structure
that is used as an index data structure.


On 1/25/07, Rajiv Roopan <[EMAIL PROTECTED]> wrote:



http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html


On 1/24/07, Sairaj Sunil <[EMAIL PROTECTED]> wrote:
>
> Hi all,
> Can you tell me the exact indexing algorithm used by Lucene. or give
some
> links to the documents that describe the algorithm used by lucene
> Thanks in advance
> --
> Sairaj Sunil
>
>





--
Sairaj Sunil


Re: Lucene Indexing

2007-01-26 Thread Sairaj Sunil

I went through that document. It mentions about the Lucene's Indexing
algorithm that it uses incremental algorithm. So, can i say that it uses a
combination of segment-based and b-tree based strategies. If i am wrong
please correct me.

On 1/26/07, Damien McCarthy <[EMAIL PROTECTED]> wrote:


This document should contain the information you need :

http://lucene.sourceforge.net/talks/inktomi/

Damien.
-Original Message-
From: Sairaj Sunil [mailto:[EMAIL PROTECTED]
Sent: 26 January 2007 03:22
To: java-user@lucene.apache.org
Subject: Re: Lucene Indexing

Hi
I was asking what exactly is the inverted indexing strategy used for
storing
the index. Is it batch-based index/b-tree based/segment-based data
structure
that is used as an index data structure.


On 1/25/07, Rajiv Roopan <[EMAIL PROTECTED]> wrote:
>
>
>

http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.h
tml
>
>
> On 1/24/07, Sairaj Sunil <[EMAIL PROTECTED]> wrote:
> >
> > Hi all,
> > Can you tell me the exact indexing algorithm used by Lucene. or give
> some
> > links to the documents that describe the algorithm used by lucene
> > Thanks in advance
> > --
> > Sairaj Sunil
> >
> >
>
>


--
Sairaj Sunil


-----
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





--
Sairaj Sunil
II Mtech(CS)
SSSIHL
Prashanthi Nilayam


Simple QueryParser question

2007-02-01 Thread Sairaj Sunil

Hi,
This is a newbie-level question.

I want to construct a query, which returns the results sorted as follows:
1. Results having "all the terms" of the query string in title should be
listed first
2. Results having  "any of the terms" of the query string in the title
should be listed next.
3. Results having "all the terms"  of the query string in summary should be
listed next
4. Results having "any of the terms" of the query string in the summary
should come last.

How do I construct the query using QueryParser. The fields to be searched
are "title", "summary".
Please help me
--
Sairaj Sunil


Merge factor problem,

2007-02-09 Thread Sairaj Sunil

Hi all,
I have increased the merge factor from 10 to 50. I thought the indexing
performance will be better. But the time taken taken to index is more than
the time taken for the merge factor of 10. The documentation and some
articles say that the time taken to index will improve if the merge factor
is increased.
I have changed the merge factors to 50, 100, 1000. I have left the
minMergeDocs to be the default value for all the cases. The time taken to
index same number of documents increased in a linear fashion, which is
exactly opposite according to the info I have read.
Is this the correct behavior. In which cases this behavior happens?

Regards
--
Sairaj Sunil


Re: Merge factor problem,

2007-02-10 Thread Sairaj Sunil

Hi,
I saw that article and it tells me that increasing the mergeFactor speeds up
the indexing. But the reverse had happened in my case.
To be more specific I had conducted some experiments for 1000 documents. The
time taken is quite large, due to pdf file indexing. I had changed the
indexwriter's parameters.

MergeFactor – default(10)
minMergeDocs – default(10)
Time taken – 690 sec

MergeFactor – 50
minMergeDocs – default(10)
Time taken – 765 sec
MergeFactor – default(10)
minMergeDocs – 100
Time taken – 670 sec

MergeFactor –100
minMergeDocs – 100
Time taken – 738 sec
Increasing the mergeFactor did not speed up, but increasing the minMergeDocs
had improved. I am using Lucene.Net.
Can you explain the behavior. I am confused

On 2/10/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:


Sairaj, see http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html

Increase your maxBufferedDocs.

Otis

- Original Message 
From: Sairaj Sunil <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, February 9, 2007 11:14:50 AM
Subject: Merge factor problem,

Hi all,
I have increased the merge factor from 10 to 50. I thought the indexing
performance will be better. But the time taken taken to index is more than
the time taken for the merge factor of 10. The documentation and some
articles say that the time taken to index will improve if the merge factor
is increased.
I have changed the merge factors to 50, 100, 1000. I have left the
minMergeDocs to be the default value for all the cases. The time taken to
index same number of documents increased in a linear fashion, which is
exactly opposite according to the info I have read.
Is this the correct behavior. In which cases this behavior happens?

Regards
--
Sairaj Sunil




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





--
Sairaj Sunil
II Mtech(CS)
SSSIHL
Prashanthi Nilayam


Re: Merge factor problem,

2007-02-10 Thread Sairaj Sunil

Hi,
just to give more info, I am using Lucene.Net 1.3 version, and not
1.9version. I think there is no option of setmaxBufferedDocs() in the
old
version. Can you tell me the best way to speed up the performance. What are
the parameters that I should set. I know that this depends on the system,
but which parameter exactly speeds up the indexing performance.

Thank you
--
Sairaj Sunil
II Mtech(CS)
SSSIHL
Prashanthi Nilayam