Re: Performance of never optimizing

2008-11-03 Thread Mark Miller
Been a while since I've been in the benchmark stuff, so I am going to take some time to look at this when I get a chance, but off the cuff I think you are open and closing the reader for each search. Try using the openreader task before the 100 searches and then the closereader task. That will

Re: Reading from an IndexWriter

2008-11-03 Thread Erick Erickson
One thing that others have tried is to keep a RAMindex that you use for your modifications. That is, an index that *only* has your mods, not your original index. But, and here's the key, when you update, you update BOTH your RAM and FS based indexes. When searching, you search BOTH indexes, giving

Re: Performance of never optimizing

2008-11-03 Thread Justus Pendleton
On 03/11/2008, at 11:07 PM, Mark Miller wrote: Am I missing your benchmark algorithm somewhere? We need it. Something doesn't make sense. I thought I had included in at[1] before but apparently not, my apologies for that. I have updated that wiki page. I'll also reproduce it here: { "Ro

Reading from an IndexWriter

2008-11-03 Thread Matthew DeLoria
I had a question about more about Best Practices and reading from an IndexWriter. Currently, we have an index which we call the master index. This index, in itself, represents our data model. Many clients can access this index. However, we have importer and updating clients which essentially add

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-11-03 Thread Todd Benge
Pablo, Would you mind adding a little more detail about how you're working around the problem? I'm still evaluating our different options so am interested in what you did. Todd On Mon, Nov 3, 2008 at 2:37 PM, PabloS <[EMAIL PROTECTED]> wrote: > > Thanks hossman, but I've already 'solved' the pr

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-11-03 Thread PabloS
Thanks hossman, but I've already 'solved' the problem without the need to patch lucene. I had to code a bit around Lucene's visibility restrictions but I've managed to completely skip the field caching mechanism and add ehcache to it. At the moment it seems to be working quite well, although not

Re: No segment files found/ Searcher error

2008-11-03 Thread Michael McCandless
part of the check , this exception is thrown: Error: could not read any segments file in directory java.io.FileNotFoundException: no segments* file found in [EMAIL PROTECTED]/rt10/jetty/20081103 at org.apache.lucene.index.SegmentInfos $findSegmentsFile.run(SegementInfos.java:587)

How to use TermFreqVector to search similar documents, and the BooksLikeThis example

2008-11-03 Thread Teruhiko Kurosaka
Hi, I'd like to find documents that are similar to the one I have in the index (or the one I am abuot to add, if there is no similar document... I prefer this way if possible). If I understand it correctly, I should be able to use TermFreqVector for this. I wanted to tell Lucene, "search for simil

Sort search by weight of search term

2008-11-03 Thread stephenlindauer
I have a lucene search and I want to implement a way to sort the search by giving one search term more importance than another and sort it by the scores i'm getting. What would be the best way to do something like this? -- View this message in context: http://www.nabble.com/Sort-search-by-weight

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-11-03 Thread Chris Hostetter
: I'm having a similar problem with my application, although we are using : lucene 2.3.2. The problem we have is that we are required to sort on most of : the fields (20 at least). Is there any way of changing the cache being used? there is a patch in Jira that takes a completley different approa

No segment files found/ Searcher error

2008-11-03 Thread JulieSoko
part of the check , this exception is thrown: Error: could not read any segments file in directory java.io.FileNotFoundException: no segments* file found in [EMAIL PROTECTED]/rt10/jetty/20081103 at org.apache.lucene.index.SegmentInfos$findSegmentsFile.run(SegementInfos.java:587) . Is

RE: Read all the data from an index

2008-11-03 Thread Dragon Fly
Thank you both for your help. > Date: Fri, 31 Oct 2008 09:06:50 +0100 > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: Re: Read all the data from an index > > Erick Erickson wrote: > > I'm not sure what *could* be easier than looping with IndexSearcher.doc(), > > looping fro

Re: Performance of never optimizing

2008-11-03 Thread Toke Eskildsen
On Mon, 2008-11-03 at 04:42 +0100, Justus Pendleton wrote: > 1. Why does the merge factor of 4 appear to be faster than the merge > factor of 2? Because you alternate between updating the index and searching? With 4 segments, chances are that most of the segment-data will be unchanged between sear

Re: Performance of never optimizing

2008-11-03 Thread Mark Miller
Am I missing your benchmark algorithm somewhere? We need it. Something doesn't make sense. - Mark Justus Pendleton wrote: Howdy, I have a couple of questions regarding some Lucene benchmarking and what the results mean[3]. (Skip to the numbered list at the end if you don't want to read the

RE: Performance of never optimizing

2008-11-03 Thread Ard Schrijvers
Hello Justus, Chris and Otis, IIRC Ocean [1] by Jason Rutherglen addresses the issue for real time searches on large data sets. A conceptually comparable implementation is done for Jackrabbit, where you can see an enlighting picture over here [2]. In short: 1) IndexReaders are opened only once