Re: Index Partitioning

2009-03-23 Thread Shashi Kant
This is perfect, exactly what I was looking for. Thanks much Andrzej! On Mon, Mar 23, 2009 at 1:43 AM, Andrzej Bialecki wrote: > Shashi Kant wrote: > >> Is there an "elegant" approach to partitioning a large Lucene index (~1TB) >> into smaller sub-indexes other than the obvious method of re-ind

Re: Lucene 2.2.0 in 64-bit JVM: IndexReader is hung

2009-03-23 Thread Michael McCandless
Still, it's good to raise such issues here, and track them with Jira. Even if it's not a Lucene issue, it'd be great to help Sun fix the JRE issue. It's possible to reduce the Lucene case down to a simpler test case to give to Sun. This is what happened with LUCENE-1282, which resulted in Sun f

RE: Lucene 2.2.0 in 64-bit JVM: IndexReader is hung

2009-03-23 Thread Venkat Rangan
Yonik, Thanks for your response. It is actually hanging on RandomAccessFile.readBytes() when there are in fact bytes to read. Switching to 32-bit JVM does not hang against the same index. Also, as you point out, this may be a JVM/OS issue, but Lucene just exposes it. -venkat -Original Mes

Re: Memory Leak?

2009-03-23 Thread Michael McCandless
Is there anything else in this JRE? 65 MB ought to be plenty for what you are trying to do w/ just Lucene, I think. Though to differentiate whether "you are not giving enough RAM to Lucene" vs "you truly have a memory leak", you should try increasing the heap size to something absurdly big (256

Re: Memory Leak?

2009-03-23 Thread Chetan Shah
I am using the default heap size which according to Netbeans is around 65MB. If the RAM directory was not initialized correctly, how am I getting valid search results? I am able to execute searches for quite some time before I get OOME. Makes Sense? Or Maybe I am missing something, please let m

Re: Lucene 2.2.0 in 64-bit JVM: IndexReader is hung

2009-03-23 Thread Yonik Seeley
So even when you stop searches, you observe a thread stuck on RandomAccessFile.readBytes()? In an active system, it would be normal to see threads blocked there... just on different calls. If RandomAccessFile.readBytes() is actually hanging, it's not a Lucene issue, but a JVM/OS bug. -Yonik htt

Re: Memory Leak?

2009-03-23 Thread Matthew Hall
Perhaps this is a simple question, but looking at your stack trace, I'm not seeing where it was set during the tomcat initialization, so here goes: Are you setting up the jvm's heap size during your Tomcat initialization somewhere? If not, that very well could be part of your issue, as the st

Re: Memory Leak?

2009-03-23 Thread Chetan Shah
The stack trace is attached. http://www.nabble.com/file/p22667542/dump dump The file size of _30.cfx - 1462KB _32.cfs - 3432KB _30.cfs - 645KB Michael McCandless-2 wrote: > > > Hmm... after how many queries do you see the crash? > > Can you post the full OOME stack trace? > > You're

Lucene 2.2.0 in 64-bit JVM: IndexReader is hung

2009-03-23 Thread Venkat Rangan
Hi, We have an application using Lucene 2.2.0 running in a Sun HotSpot JVM, with JDK 1.6.0. We have had no problems with it in the 32-bit version of the JVM. We recently upgraded to 64-bit JVM and occasionally, we are observing a hang. In particular, the stack trace looks like this - the Random

Re: Memory Leak?

2009-03-23 Thread Michael McCandless
Hmm... after how many queries do you see the crash? Can you post the full OOME stack trace? You're using a RAMDirectory to hold the entire index... how large is your index? Mike Chetan Shah wrote: After reading this forum post : http://www.nabble.com/Lucene-Memory-Leak-tt19276999.html#a

Re: Memory Leak?

2009-03-23 Thread Chetan Shah
After reading this forum post : http://www.nabble.com/Lucene-Memory-Leak-tt19276999.html#a19364866 I created a Singleton For Standard Analyzer too. But the problem still persists. I have 2 singletons now. 1 for Standard Analyzer and other for IndexSearcher. The code is as follows : package w

Re: Memory Leak?

2009-03-23 Thread Chetan Shah
No, I have a singleton from where I get my searcher and it is kept through out the application. Michael McCandless-2 wrote: > > > Are you not closing the IndexSearcher? > > Mike > > Chetan Shah wrote: > >> >> I am initiating a simple search and after profiling the my >> application using

Re: First request for search is taking longer time and subequent requests are very fast

2009-03-23 Thread thiruvee
Hi I have 3 instances running for this. 1. Windows + tomcat 2. Linux + tomcat 3. Linx + WebSphere I observed this problem on all the 3 instances. Thanks Ravi Michael McCandless-2 wrote: > > > Which OS are you on? > > It's possible the OS has decided to swap Tomcat's pages out to use RAM >

Re: First request for search is taking longer time and subequent requests are very fast

2009-03-23 Thread Michael McCandless
Which OS are you on? It's possible the OS has decided to swap Tomcat's pages out to use RAM as IO cache for other processes, instead. Mike thiruvee wrote: Hi David, Thanks for your reply. 1. I will try having warm up queries after index is created, that will solve to some extent. 2.

Re: Memory Leak?

2009-03-23 Thread Michael McCandless
Are you not closing the IndexSearcher? Mike Chetan Shah wrote: I am initiating a simple search and after profiling the my application using NetBeans. I see a constant heap consumption and eventually a server (tomcat) crash due to "out of memory" error. The thread count also keeps on inc

Re: First request for search is taking longer time and subequent requests are very fast

2009-03-23 Thread thiruvee
Hi David, Thanks for your reply. 1. I will try having warm up queries after index is created, that will solve to some extent. 2. The biggest problem is server would be idle for long time. I am using spring in my project and the searcher,reader are singleton objects managed by spring. I don't u

Memory Leak?

2009-03-23 Thread Chetan Shah
I am initiating a simple search and after profiling the my application using NetBeans. I see a constant heap consumption and eventually a server (tomcat) crash due to "out of memory" error. The thread count also keeps on increasing and most of the threads in "wait" state. Please let me know what

Re: First request for search is taking longer time and subequent requests are very fast

2009-03-23 Thread David Causse
Hi, Searcher and IndexReader use an internal cache, when your searcher is created the first query is slow cause lucene fills its cache. We re-use whenever possible searchers and readers instances. I've heard on this list that it's also a solution to launch warmup queries just after reader/sear

First request for search is taking longer time and subequent requests are very fast

2009-03-23 Thread thiruvee
Hi I am using Lucene 2.4 in our project. I am using FSdirectory to store the index. when ever index is updated the first search is very slow. I am using the combination of CustomScoreQuery and DisjunctionMaxQuery for searching. This slowness I observed even when the server (tomcat/websphere) is

Re: Matching query terms

2009-03-23 Thread Michael McCandless
You mean, for each doc in the topN you want to be able to find out which terms caused it to match? This is frequently requested feature (I think there was another thread just recently). But unfortunately there's not really a good/simple way today (I think?). Someone should at least start a wiki

Similarity

2009-03-23 Thread john atsh
I want to change slightly the similarity function, in the following way: Use same cosine similarity as defined by DefaultSimilarity, but multiply the result score by f, where f is defined as following f = (# of terms in query that appear also in document) / (# of terms in document) (this boosts

Re: Corrupt index (IndexOutOfBoundsException)

2009-03-23 Thread Michael McCandless
Something appears to be wrong with your _X.tii file (inside the compound file). Can you post the code that recreates this broken index? Since it appears to be repeatable, could you regenerate your index with compound file off, confirm the problem still happens, and then post the _X.tii f

Re: Matching query terms

2009-03-23 Thread Wouter Heijke
i came across that one, only the java doc says that it is expensive so I was hoping for a less expensive solution... Wouter > searcher.explain definitely seems to do the trick, going through the > sub-queries. > > paul > > > Le 23-mars-09 à 13:12, Wouter Heijke a écrit : > >> I want to know for e

Re: Matching query terms

2009-03-23 Thread Paul Libbrecht
searcher.explain definitely seems to do the trick, going through the sub-queries. paul Le 23-mars-09 à 13:12, Wouter Heijke a écrit : I want to know for each term in a query if it matched the result or not. What is the best way to implement this? Highlighter seems to be able to do the tri

Matching query terms

2009-03-23 Thread Wouter Heijke
I want to know for each term in a query if it matched the result or not. What is the best way to implement this? Highlighter seems to be able to do the trick only that I don't need to 'highlight' any text. After knowing if terms in the query matched I want do do something else based on this. Woute

Corrupt index (IndexOutOfBoundsException)

2009-03-23 Thread René Zöpnek
Hello, I'm using Lucene 2.3.2 and had no problems untill now. But now I got an corrupt index. When searching, a java.lang.OutOfMemoryError is thrown. I've wrote the following test program: private static void search(String index, String query) throws CorruptIndexException, IOException, ParseEx

Re: How to know the matched field?

2009-03-23 Thread Paul Libbrecht
Thanks Erick, I browsed but no full answer yet. The closest seems to be the explain method with which I could find the exact term-query or prefix-query that matched it, so I would be able to find the name of the field. I am still left with iterating through the (stored) fields and try to f