Re: Improving search performance

2008-05-28 Thread Emmanuel Bernard
Hi Rakesh, I've spend the afternoon and the evening playing around your test because I could not stand Hibernate Search to be significantly slower than native Lucene ;) I found several causes but as far as your test case is concerned, it turns out you are reaching the scalability limit of a

Re: IndexReader.reopen memory leak

2008-05-28 Thread John Wang
Yes, I do close the old reader. I have a large index, my system is doing real time updates: 1 thread writing batches of updates to the index, after each index update, it updates the reader. I have two readers open always, one is serving the search requests, while the other updates and the two flips

Opening an index directory inside a jar

2008-05-28 Thread Ravi_116
I get the following error trace - java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@/Users/projects/workspace/project_name/web/file:/Users/.m2/repository/com/mycompany/project_name/2.1.0-internal-65-SNAPSHOT/suggesters-2.1.0-internal-65-SNAPSHOT.jar!/lu

Re: proximity search

2008-05-28 Thread Yonik Seeley
On Wed, May 28, 2008 at 5:36 AM, stefano coppi <[EMAIL PROTECTED]> wrote: > text: BB AA > query: "AA BB"~0 why the result is false? Aren't BB AA contigous? > result: false > > text: BB AA > query: "AA BB"~1 > result: false > > text: BB AA > query: "AA BB"~2 why with proximity=2 the result is tru

Re: IndexReader.reopen memory leak

2008-05-28 Thread Mark Miller
As someone that has done a lot of reopens, I can vouch there is no leak under simple, normal usage. Are you sure your closing the original reader after getting the reopened reference? Michael Busch wrote: Hi John, hmm not good. I will take a look. It has probably to do with the reference cou

Re: IndexReader.reopen memory leak

2008-05-28 Thread Michael Busch
Hi John, hmm not good. I will take a look. It has probably to do with the reference counting. Are you doing anything special? E. g. do you have own reader implementations that you call reopen() on? What kinds of readers are you using? Are you maybe able to provide a heapdump? -Michael John

Re: Query works in Luke but not in code...

2008-05-28 Thread Casey Dement
LOL - I sure wish it was! :) Sadly, that was a typo (Luke, for all its beauties, does not seem to grasp the concept of a clipboard so the sample was a manual transcription). A few more details - don't know if this will help or not. Same query as before, when I do a rewrite of the query in Luke I

Re: Frequencies sorted by frequencies

2008-05-28 Thread Grant Ingersoll
I think you could override all the Similarity factors except tf() with 1, such that the term frequency is the only factor in the scoring. Then you just submit the term as a query. Note, I think you will need to override the similarity during indexing, too, so that norm length is turned of

Re: How to add PageRank score with lucene's relevant score in sorting

2008-05-28 Thread Glen Newton
You should consider keeping the PageRank (and any other more dynamic data) in a separate index (with the documents in the same oder as your bigger, more static index) and then use a ParallelReader on both of them. See: http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/index/ParallelReade

Frequencies sorted by frequencies

2008-05-28 Thread Hider, Sandy
Hi All, I am trying to figure out a quick way to find the top N documents sorted by frequency of a term. I found: IndexRead.termDocs() which provides an enumeration of doc() and freq() but it returns an enumeration sorted by doc number. Is there a way to get the results sorted by freq? Or is

Re: How to add PageRank score with lucene's relevant score in sorting

2008-05-28 Thread 过佳
I think this is not suitable for my system since the num of pages is very large that will cost much time for reindex 2008/5/28, Ian Lea <[EMAIL PROTECTED]>: > > Yes. But you'd have to do that anyway if you are storing pagerank in the > index. > > One point on your 20s response time for sorting -

Re: Boolean Query Issue

2008-05-28 Thread Erick Erickson
It's unclear what you *should* expect. How is your data distributed? In other words, how many documents do you have? In this example, for instance, 1. TTL:data AND TTL:store OR TTL:variable => 3,733 results it considered the TTL:data part only. it's perfecily reasonable if every document that had

Re: How to add PageRank score with lucene's relevant score in sorting

2008-05-28 Thread Ian Lea
Yes. But you'd have to do that anyway if you are storing pagerank in the index. One point on your 20s response time for sorting - is that for the first sort or subsequent ones? I believe that the first one will usually be substantially slower. But sorting is always likely to be slower than not so

Re: How to add PageRank score with lucene's relevant score in sorting

2008-05-28 Thread 过佳
thanks lan, but this means that i must reindex these pages while the pagerank score changed? 在08-5-28,Ian Lea <[EMAIL PROTECTED]> 写道: > > Hi > > > Maybe you could use the pagerank score, possibly modified, as document > boost at indexing time. From the javadocs for > Document.setBoost(boost) > >

Boolean Query Issue

2008-05-28 Thread Sonu Sudhakar
Hi, I have some issue with boolean queries. I am using Lucene-core-2.3.1. I have done test on boolean query with 3 terms (data, store, variable) in my TTL field. The TTL field is indexed and searched using StandardAnalyzer. The three terms when searched individually gave the following result 1

Re: Refreshing IndexReaders for our desktop searching app

2008-05-28 Thread Mark Miller
If you are using a more recent version of Lucene you might check out https://issues.apache.org/jira/browse/LUCENE-1026 and try the WarmingIndexAccessor. Even if you don't use it, it will serve as a decent example. Ian Lea wrote: Hi I think that you will need to close your reader objects.

Re: Refreshing IndexReaders for our desktop searching app

2008-05-28 Thread Ian Lea
Hi I think that you will need to close your reader objects. Hanging on to them may prevent files from being deleted and you are likely to hit memory or open file limitations. We generally use a low tech approach: save reference to old reader/searcher create new one and give that out to those

Re: How to add PageRank score with lucene's relevant score in sorting

2008-05-28 Thread Ian Lea
Hi Maybe you could use the pagerank score, possibly modified, as document boost at indexing time. From the javadocs for Document.setBoost(boost) "Sets a boost factor for hits on any field of this document. This value will be multiplied into the score of all hits on this document" so will give

How to add PageRank score with lucene's relevant score in sorting

2008-05-28 Thread 过佳
hi all , I have a problem that how to "combine" two score to sort the search result documents. for example I have 10 million pages in lucene index , and i know their pagerank scores. i give a query to it , every docs returned have a lucene-score, mark it as R (relevant score), and i al

Refreshing IndexReaders for our desktop searching app

2008-05-28 Thread Christian Reuschling
Hello out there, We have implemented some open source desktop searching app based on Lucene http://sourceforge.net/projects/dynaq Development always goes further, and currently we make experiments with the file-lock based writer (/reader) synchronization capabilities of Lucene, in order to waste

proximity search

2008-05-28 Thread stefano coppi
Hello everyone, I'm testing the use of proximity search operator (~) in Lucene. I noticed a strange behaviour when the terms in the text are not in the same order of the query. Here are some examples: text: AA BB query: "AA BB"~0 result: true text: AA ZZ BB query: "AA BB"~0 result: false tex

Refreshing IndexReaders for our desktop searching app

2008-05-28 Thread Christian Reuschling
Hello out there, We have implemented some open source desktop searching app based on Lucene http://sourceforge.net/projects/dynaq Development always goes further, and currently we make experiments with the file-lock based writer (/reader) synchronization capabilities of Lucene, in order to waste