AW: OutOfMemoryError using IndexWriter

2009-06-25 Thread stefan
Hi Mike, I ran a burn in test overnight with repeatedly indexing the same db in a loop. I set the heap size to 120MB and called setMaxBufferedDeleteTerms( 1000), I did not call commit and used the same index writer. This test passed without any errors. So to wrap this up - I shall call commit

Re: question about (problem with?) use of FieldCache$StringIndex

2009-06-25 Thread Ulf Dittmer
Otis Gospodnetic wrote: FieldCache class is used for sorting. Are you sorting by a few different fields by any chance? Yes, we're sorting for one or two fields, depending on user settings. Uwe Schindler wrote: This class is used, when you sort your result against a field, which contains

Re: question about (problem with?) use of FieldCache$StringIndex

2009-06-25 Thread Otis Gospodnetic
Ah, the trusted LIA... :) FieldCache class is used for sorting. Are you sorting by a few different fields by any chance? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Ulf Dittmer > To: java-user@lucene.apache.org > Sent: Thursday, Jun

Optimizing unordered queries

2009-06-25 Thread Nigel
I recently posted some questions about performance problems with large indexes. One key thing about our situation is that we don't need sorted results (either by relevance or any other key). I've been looking into our memory usage and tracing through some code, which in combination with the recen

Re: Analyzing performance and memory consumption for boolean queries

2009-06-25 Thread Nigel
On Wed, Jun 24, 2009 at 4:47 PM, Uwe Schindler wrote: > Have you tried out, if GC affects you? A first step would be to turn on GC > logging with -verbosegc -XX:+PrintGCDetails > > If you see some relation between query time and gc messages, you should try > to use a better parallelized GC and ch

Re: Lucene Judge

2009-06-25 Thread Grant Ingersoll
See http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/benchmark/quality/package-summary.html where it gives details on running and using TrecJudge and the quality benchmark. On Jun 25, 2009, at 3:04 PM, AlexElba wrote: Hello, I was looking to Judge interface with TrecJudge implemen

RE: question about (problem with?) use of FieldCache$StringIndex

2009-06-25 Thread Uwe Schindler
This class is used, when you sort your result against a field, which contains string values (no numerics). For each field sorted on / index a separate StringIndex is created, that stays persistens until the index is closed (because it costs much CPU to build this stringindex, this is why the first

question about (problem with?) use of FieldCache$StringIndex

2009-06-25 Thread Ulf Dittmer
Hello- We're looking at memory issues we're having with a fair-sized web app that uses Lucene for search. While looking at heap dumps, we discovered that there were 3 instances of org.apache.lucene.search.FieldCache$StringIndex, each about 110MB in size (out of a total of 1 GB). Looking

RE: Order of fields within a Document in Lucene 2.4+

2009-06-25 Thread Sudarsan, Sithu D.
I agree. Using Lucene 2.4.1 doc.getFields() returns in alpha order and not the order in which they were added. Sincerely, Sithu D Sudarsan -Original Message- From: Matt Turner [mailto:m4tt_tur...@hotmail.com] Sent: Thursday, June 25, 2009 4:33 PM To: java-user@lucene.apache.org Subje

Order of fields within a Document in Lucene 2.4+

2009-06-25 Thread Matt Turner
The Lucene FAQ says... What is the order of fields returned by Document.fields()? * Fields are returned in the same order they were added to the document. (now getFields() as fields is deprecated) However I think this may no longer be the case in 2.4 We are indexing documents in a specific

Lucene Judge

2009-06-25 Thread AlexElba
Hello, I was looking to Judge interface with TrecJudge implementation and I am not clear how to use it. What data do I need to pass into constructor. Anybody have any experience with this class? Thanks, Alex -- View this message in context: http://www.nabble.com/Lucene-Judge-tp24209288p24209

Re: Indexing

2009-06-25 Thread Erick Erickson
This is really a permissions problem, which has been discussed frequently. I think you'd get farther faster by searching the mail archive (see this page, near the bottom: http://lucene.apache.org/java/docs/mailinglists.html and see if those disc

Re: OutOfMemoryError using IndexWriter

2009-06-25 Thread Michael McCandless
Interesting that excessive deletes buffering is not your problem... Even if you can't post the resulting test case, if you can simplify it & run locally, to rule out anything outside Lucene that's allocating the byte/char/byte[] arrays, that can help to isolate. Also, profilers can trace where al

Indexing

2009-06-25 Thread ManjuNadigar
In my Application currently I am indexing object with One Field[ID] to Hold ID of the Object which is stored and attributes of Object into Another Field[Content] to hold attribute information seperated by space and this Field is tokenized. When I search for information related to the Object I get

AW: OutOfMemoryError using IndexWriter

2009-06-25 Thread stefan
Hi, I'm afraid my test setup and code this is far too big. What I use lucene for is fairly simple. I have a database with about 150 tables, I iterate all tables and create for each row a String representation similar to a toString method containing all database data. This string is then fed tog

RE: setTermInfosIndexDivisor

2009-06-25 Thread Uwe Schindler
> The culprit here is sorting. I stopped sorting then memory consumption is > reduced nearly 50%. Further for Testing purpose i set > setTermInfosIndexDivisor to 50 then memory consumption is further reduced. > > Currently i am sorting DateTime with minute resolution as single string. > If i split

Re: setTermInfosIndexDivisor

2009-06-25 Thread Ganesh
Thanks for your immediate response. The culprit here is sorting. I stopped sorting then memory consumption is reduced nearly 50%. Further for Testing purpose i set setTermInfosIndexDivisor to 50 then memory consumption is further reduced. Currently i am sorting DateTime with minute resolution a

Re: OutOfMemoryError using IndexWriter

2009-06-25 Thread Michael McCandless
OK it looks like no merging was done. I think the next step is to call IndexWriter.setMaxBufferedDeleteTerms(1000) and see if that prevents the OOM. Mike On Thu, Jun 25, 2009 at 7:16 AM, stefan wrote: > Hi, > > Here are the result of CheckIndex. I ran this just after I got the OOError. > > OK [4

AW: OutOfMemoryError using IndexWriter

2009-06-25 Thread stefan
Hi, Here are the result of CheckIndex. I ran this just after I got the OOError. OK [4 fields] test: terms, freq, prox...OK [509534 terms; 9126904 terms/docs pairs; 4933036 tokens] test: stored fields...OK [148124 total field count; avg 2 fields per doc] test: term vectors...

Re: OutOfMemoryError using IndexWriter

2009-06-25 Thread Simon Willnauer
On Thu, Jun 25, 2009 at 1:13 PM, Michael McCandless wrote: > Can you post your test code?  If you can make it a standalone test, > then I can repro and dig down faster. > > Can you try calling IndexWriter.setMaxBufferedDeleteTerms (eg, 1000) > and see if that prevents the OOM? > > Mike > > On Thu,

Re: OutOfMemoryError using IndexWriter

2009-06-25 Thread Michael McCandless
Can you post your test code? If you can make it a standalone test, then I can repro and dig down faster. Can you try calling IndexWriter.setMaxBufferedDeleteTerms (eg, 1000) and see if that prevents the OOM? Mike On Thu, Jun 25, 2009 at 7:10 AM, stefan wrote: > > Hi Mike, > > I just changed my

AW: OutOfMemoryError using IndexWriter

2009-06-25 Thread stefan
Hi Mike, I just changed my test-code to run in an indefinite loop over the database to index everything. Set the jvm to 120MB heap size, all other parameters as before. I got an OOError just as before - so I would say there is a leak somewhere. Here is the histogram. Heap Histogram All Class

Re: setTermInfosIndexDivisor

2009-06-25 Thread Michael McCandless
On Thu, Jun 25, 2009 at 6:09 AM, Ganesh wrote: > > What about setTermInfosIndexDivisor?? > > Directory dir = FSDirectory.getDirectory(indexPath); > IndexReader reader = IndexReader.open(dir, true); > reader.setTermInfosIndexDivisor(5); > > It supposed to load only one fifth of the terms available??

Re: setTermInfosIndexDivisor

2009-06-25 Thread Ganesh
What about setTermInfosIndexDivisor?? Directory dir = FSDirectory.getDirectory(indexPath); IndexReader reader = IndexReader.open(dir, true); reader.setTermInfosIndexDivisor(5); It supposed to load only one fifth of the terms available?? But there is no difference in memory consumption with / w

Re: OutOfMemoryError using IndexWriter

2009-06-25 Thread Michael McCandless
On Thu, Jun 25, 2009 at 3:02 AM, stefan wrote: >>But a "leak" would keep leaking over time, right?  Ie even a 1 GB heap >>on your test db should eventually throw OOME if there's really a leak. > No not necessarily, since I stop indexing ones everything is indexed - I > shall try repeated runs wit

Re: setTermInfosIndexDivisor

2009-06-25 Thread Michael McCandless
On Thu, Jun 25, 2009 at 5:40 AM, Ganesh wrote: > I am updating status of the documents frequently. There will be huge number > of deletes. I do optimize the index once in a day. OK > I want to know the usage for setTermInfosIndexDivisor. > > Directory dir = FSDirectory.getDirectory(indexPath); >

Re: setTermInfosIndexDivisor

2009-06-25 Thread Ganesh
I am updating status of the documents frequently. There will be huge number of deletes. I do optimize the index once in a day. I want to know the usage for setTermInfosIndexDivisor. Directory dir = FSDirectory.getDirectory(indexPath); IndexReader reader = IndexReader.open(dir, true); reader.set

Re: setTermInfosIndexDivisor

2009-06-25 Thread Michael McCandless
setTermIndexInterval only helps appreciably when an index has a truly immense number of terms (often, "by accident" eg your document filtering/analysis process accidentally allowed binary terms into the index); it's meant primarily as a "safety" for such situations. If you run CheckIndex, it print

Re: setTermInfosIndexDivisor

2009-06-25 Thread Simon Willnauer
Hey there, On Thu, Jun 25, 2009 at 9:10 AM, Ganesh wrote: > Hello all, > > I am using Lucene v2.4.1 > > 1) > I have build multiple indexes of total 30 million documents. My memory limit > is 512 MB. In this case i am getting frequently OOME. If i increased the > memory limit to 1 GB / 1.5 GB the

Re: wheres the word

2009-06-25 Thread Timon Roth
hoi paul i now tried with the hint from mark miller...disabling all the stopwords from standardanalyzer... String stop_words[] = new String[0]; ...StandardAnalyzer(stop_words); this works perfect..;-) gruess, timon Am Donnerstag, 25. Juni 2009 schrieb Paul Libbrecht: > > Le 25-juin-09 à 01:2

setTermInfosIndexDivisor

2009-06-25 Thread Ganesh
Hello all, I am using Lucene v2.4.1 1) I have build multiple indexes of total 30 million documents. My memory limit is 512 MB. In this case i am getting frequently OOME. If i increased the memory limit to 1 GB / 1.5 GB then it is working fine. My point is it will also will get exhausted when

AW: OutOfMemoryError using IndexWriter

2009-06-25 Thread stefan
Hi, >But a "leak" would keep leaking over time, right? Ie even a 1 GB heap >on your test db should eventually throw OOME if there's really a leak. No not necessarily, since I stop indexing ones everything is indexed - I shall try repeated runs with 120MB. >Are you calling updateDocument (which

Re: wheres the word

2009-06-25 Thread Paul Libbrecht
Le 25-juin-09 à 01:28, Mark Miller a écrit : im figgering about the following problem. in my index i cant find the word BE, but it exists in two documents. im usinglucene 2.4 with the standardanalyzer. other querys with words like de, et or de la works good. any ideas? be is a stopword. Do