Re: IndexReader.reopen memory leak

2008-05-29 Thread John Wang
My client does not call my reader.reopen(), I have implemented a reload() method off of my reader (void reload()), and it discards the internal reader upon a reload. Due to another issue (an api issue with IndexReader, e.g. all derived implementations have to reimplement reopen because it has to re

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-05-29 Thread Paul J. Lucas
On May 29, 2008, at 6:35 PM, Michael McCandless wrote: Can you use lsof (or something similar) to see how many files you have? FYI: I personally can't reproduce this; only a coworker can and even then it's sporadic, so it could take a little while. Merging, especially several running at o

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-05-29 Thread Mark Miller
Forgot to mention...keep trying if you get read past file exception...I get that sometimes too. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-05-29 Thread Mark Miller
Michael McCandless wrote: Michael Busch wrote: Of course it can happen that you run out of available file descriptors when a lot of threads open separate IndexReaders, and then the SegmentMerger could certainly hit IOExceptions, but I don't think a FileNotFoundException would be thrown in su

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-05-29 Thread Paul J. Lucas
On May 29, 2008, at 6:26 PM, Michael McCandless wrote: Paul J. Lucas wrote: if ( IndexReader.isLocked( INDEX ) ) IndexReader.unlock( INDEX ); The isLocked()/unlock() is because sometimes the server process gets killed and leaves teh indexed locked. This makes me a bit

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-05-29 Thread Paul J. Lucas
On May 29, 2008, at 5:57 PM, Mark Miller wrote: Paul J. Lucas wrote: Are you saying that using multiple IndexSearchers will definitely cause the problem I am experiencing and so the suggestion that using a single IndexSearcher for optimaztion only is wrong? Will it definitely cause your p

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-05-29 Thread Michael McCandless
Michael Busch wrote: Of course it can happen that you run out of available file descriptors when a lot of threads open separate IndexReaders, and then the SegmentMerger could certainly hit IOExceptions, but I don't think a FileNotFoundException would be thrown in such a case. I think I'v

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-05-29 Thread Michael McCandless
Paul J. Lucas wrote: if ( IndexReader.isLocked( INDEX ) ) IndexReader.unlock( INDEX ); The isLocked()/unlock() is because sometimes the server process gets killed and leaves teh indexed locked. This makes me a bit nervous. Does this only run on startup of your proces

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-05-29 Thread Mark Miller
Michael Busch wrote: Mark Miller wrote: Paul J. Lucas wrote: Also, if you get a ton of concurrent searches, you will have an IndexReader open for each...not only is this very wasteful in terms of RAM and time, but as your IndexWriter merges you can have all kinds of momentary references to

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-05-29 Thread Michael Busch
Mark Miller wrote: Paul J. Lucas wrote: Also, if you get a ton of concurrent searches, you will have an IndexReader open for each...not only is this very wasteful in terms of RAM and time, but as your IndexWriter merges you can have all kinds of momentary references to normally unneeded inde

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-05-29 Thread Mark Miller
Paul J. Lucas wrote: On May 29, 2008, at 5:18 PM, Mark Miller wrote: It looks to me like you are not sharing an IndexSearcher across threads. My reading of the documentation says that doing so is an optimization only and not a requirement. Are you saying that using multiple IndexSearchers

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-05-29 Thread Paul J. Lucas
On May 29, 2008, at 5:18 PM, Mark Miller wrote: It looks to me like you are not sharing an IndexSearcher across threads. My reading of the documentation says that doing so is an optimization only and not a requirement. Are you saying that using multiple IndexSearchers will definitely ca

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-05-29 Thread Mark Miller
It looks to me like you are not sharing an IndexSearcher across threads. You really should, or use a small pool of them (depending on speed/ram/load). The only time I usually see this error, I also see too many files open first. Are you sure you don't have another exception as well? Paul J

FileNotFoundException in ConcurrentMergeScheduler

2008-05-29 Thread Paul J. Lucas
I occasionally get a FileNotFoundException like: Exception in thread "Thread-44" org.apache.lucene.index.MergePolicy $MergeException: java.io.FileNotFoundException: /Stuff/Caches/ AuroraSupport/IM_IndexCache/INDEX/_27.cfs (No such file or directory) at org.apache.lucene.index.ConcurrentMergeSc

Re: lucene memory consumption

2008-05-29 Thread Yonik Seeley
2008/5/29 Alex <[EMAIL PROTECTED]>: > I believe we have around 346 million documents So that would be 346MB per indexed field that you search. Also, if you sort on anything other than score, that will take up a lot of memory to un-invert the field. -Yonik ---

RE: lucene memory consumption

2008-05-29 Thread Alex
I believe we have around 346 million documents Alex > Date: Thu, 29 May 2008 18:39:31 -0400 > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: Re: lucene memory consumption > > Alex wrote: >> Currently, searching on our index consume

Re: lucene memory consumption

2008-05-29 Thread Daniel Noll
On Friday 30 May 2008 08:17:52 Alex wrote: > Hi, > other than the in memory terms (.tii), and the few kilobytes of opened file > buffer, where are some other sources of significant memory consumption when > searching on a large index ? (> 100GB). The queries are just normal term > queries. Norms

Re: lucene memory consumption

2008-05-29 Thread Mark Miller
Alex wrote: > Currently, searching on our index consumes around 2.5GB of ram. > This is just a single term query, nothing that requires the in memory cache > like in > the FieldScoreQuery. > > > Alex > > > That seems rather high. You have 10/15 million

RE: lucene memory consumption

2008-05-29 Thread Alex
Currently, searching on our index consumes around 2.5GB of ram. This is just a single term query, nothing that requires the in memory cache like in the FieldScoreQuery. Alex > Date: Thu, 29 May 2008 15:25:43 -0700 > From: [EMAIL PROTECTED] > To: java-

Re: lucene memory consumption

2008-05-29 Thread jian chen
Not that I can think about. But, if you have any cached field data, norms array, that could be huge. Would be interested in knowing from others regarding this topic as well. Jian On 5/29/08, Alex <[EMAIL PROTECTED]> wrote: > > Hi, > other than the in memory terms (.tii), and the few kilobytes of

lucene memory consumption

2008-05-29 Thread Alex
Hi, other than the in memory terms (.tii), and the few kilobytes of opened file buffer, where are some other sources of significant memory consumption when searching on a large index ? (> 100GB). The queries are just normal term queries. ___

date filter filtering out non-dated items?

2008-05-29 Thread Phillip Rhodes
We have many different types of objects that we are indexing with Lucene (coupons, roadtrips, events, attractions, etc). Because events and coupons can expire, we would like to apply a date filter to the query to filter out the expired items, but the problem is that there are other objects l

Re: Single IndexReader vs Single IndexSearcher

2008-05-29 Thread Mark Miller
Vinicius Carvalho wrote: Hello there! My application uses multiple indexes, so I create a multireader based on my indexreaders. What I've done is create a Map of Readers, and whenever the user needs a reader I iterate over my collection, checking if it is the current index, if not I reopen it, el

Single IndexReader vs Single IndexSearcher

2008-05-29 Thread Vinicius Carvalho
Hello there! My application uses multiple indexes, so I create a multireader based on my indexreaders. What I've done is create a Map of Readers, and whenever the user needs a reader I iterate over my collection, checking if it is the current index, if not I reopen it, else, I add it to my multirea

Re: IndexReader.reopen memory leak

2008-05-29 Thread Michael Busch
Does your FilteredIndexReader.reopen() return a new instance of FilteredIndexReader in case the inner reader was updated (i. e. in!=newInner)? -Michael John Wang wrote: Yes: IndexReader newInner=in.reopen(); if (in!=newInner) { in.close(); this.in=newInner;

Re: IndexReader.reopen memory leak

2008-05-29 Thread Yonik Seeley
On Thu, May 29, 2008 at 12:25 AM, John Wang <[EMAIL PROTECTED]> wrote: > I am using my implementation of a FilteredIndexReader. Perhaps this is the issue? Can you distill a testcase that shows the problem? -Yonik - To unsubscrib

Re: IndexReader.reopen memory leak

2008-05-29 Thread Mark Miller
You are sure you don't have a reference to that old Reader somewhere, hanging around? Maybe this is fixed since I grabbed my copy of Lucene , but I can loop a reopen pretty much forever, and monitoring the memory I see not even the tiniest leak over many many many reopens. Ive been using visual

Re: IndexReader.reopen memory leak

2008-05-29 Thread John Wang
Yes: IndexReader newInner=in.reopen(); if (in!=newInner) { in.close(); this.in=newInner; // code to clean up my data _cache.clear(); _indexData.load(this, true); init(_fieldConfig); } if I change this code to: try {

Re: Using highlighter

2008-05-29 Thread Mark Miller
Vinicius Carvalho wrote: Hello there! When I use an wildcard with my query, for instance: java*. Lucene finds the document, but when using the highlighter, the getBestFragment() is returning null for a fragment that contains the word javadoc for instance. Is it possible to use the hightlighter wi

RE: Improving search performance

2008-05-29 Thread Rakesh Shete
Hi Emmanuel, Thanks for sparing time for this. Atleast now it looks like the problem is clear. I will definitley try the pooled IndexSearch approach. Could you let me know if there is a way of providing the indexsearcher instance to the Hibernate Search FullTextQuery API? If that's not possi

Re: How to add PageRank score with lucene's relevant score in sorting

2008-05-29 Thread Cam Bazz
Hello, little off topic, but how did you obtain the pagerank for each page. did you calculate it, or have you obtained it with some other way while getting a specific site. Best. On Thu, May 29, 2008 at 3:28 PM, 过佳 <[EMAIL PROTECTED]> wrote: > thanks Glen , we have tried it , but the bottleneck

Using highlighter

2008-05-29 Thread Vinicius Carvalho
Hello there! When I use an wildcard with my query, for instance: java*. Lucene finds the document, but when using the highlighter, the getBestFragment() is returning null for a fragment that contains the word javadoc for instance. Is it possible to use the hightlighter with wildcards? One option I

RE: Frequencies sorted by frequencies

2008-05-29 Thread Hider, Sandy
Thanks for taking the time to answer. I see what you mean. The thing is I also plan on using the standard score. Would there be a way to use the both the standard score and the TF-only Score in a single index? Sandy -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] S

Re: How to add PageRank score with lucene's relevant score in sorting (with Paralle Index modify)

2008-05-29 Thread Chris
I have a question with ParalleReader. I want to modify the dynamic index , how could I set the same docid to add the original docid with the more static index ? Does anyone have the idea or method to do it well ? Thank you. above ChrisLin 2008/5/28 Glen Newton <[EMAIL

Re: How to add PageRank score with lucene's relevant score in sorting

2008-05-29 Thread 过佳
thanks Glen , we have tried it , but the bottleneck is to get the document (indexReader.document(num)), so it is not efficient enough . 2008/5/28, Glen Newton <[EMAIL PROTECTED]>: > > You should consider keeping the PageRank (and any other more dynamic > data) in a separate index (with the documen

Re: Boolean Query Issue

2008-05-29 Thread Sonu Sudhakar
Erick, Thanks for your reply. I am working with approximately 1 million documents. They are indexed in 4 servers. Each document has multiple fields. I am using ParallelMultiSearcher for searching purpose. I tried a few queries in the title(TTL) field. i started with a simple query without boole

"No tvx file" error

2008-05-29 Thread Pablo B.
Hello, I am writing a code to convert all text files in subdirectories from a given path to Arff file for weka. To do so, I am using lucene-1.4.3.jar. The call to method writer.add(doc) outputs (only for some text files) the error message "No tvx file", were writer is type IndexWriter and doc is

Re: IndexReader.reopen memory leak

2008-05-29 Thread Michael Busch
Could you share some details about how you implemented reopen() in your reader? -Michael John Wang wrote: Yes, I do close the old reader. I have a large index, my system is doing real time updates: 1 thread writing batches of updates to the index, after each index update, it updates the reader