Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2010-04-09 Thread Ruben Laguna
Take a memory snapshot with JConsole -> dumpHeap [1] and the analyze it with Eclipse MAT [2]. Find the biggest objects and look at their path to GC roots to see if lucene is actually retaining them. You may also want to look to two recently closed bug reports about memory leaks [3] and [4] [1] htt

java.lang.OutOfMemoryError: GC overhead limit exceeded

2010-04-09 Thread Herbert Roitblat
Hi, folks. I am using PyLucene and doing a lot of get tokens. lucene.py reports version 2.4.0. It is rpath linux with 8GB of memory. Python is 2.4. I'm not sure what the maxheap is, I think that it is maxheap='2048m'. I think that it's running in a 64 bit environment. It indexes a set of 116,

Indexing documents generated from a template

2010-04-09 Thread Eric Hauser
Hi, I'm doing research on indexing some documents that are generated from templates. I don't have the exact statistics yet, but I'm estimating that in the standard case 90% of the document is the same across all instances of the document and the other 10% is dynamic (although it is certainly poss

Removing terms in the Index

2010-04-09 Thread Fotos fotos
Hello! I am a beginner with Lucene. I'm needing to do the following: I have a text file with the following terms: "Lucene in action" "Lucene" and a file with the following sentences: 1 - "Lucene in action now." 2 - "Lucene for Dummies" 3 - "Managing Gigabytes" I need to search in phrases of do

Re: Lucene Partition Size

2010-04-09 Thread Karl Wettin
It's hard for me to say why this is slow. Here are a few more questions whose anwers might provide further clues: What was the reasons that led you to partition the index this way? What does the searcher implementation look like? What would a typcial query sent to that searcher look like? I wou