Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-11-04 Thread Pablo Saavedra
I hope that helps, if you find anything interesting do post it somewhere. I'm afraid I'm a little bit far away from New Orleans at the moment. Regards. 2008/11/4 Todd Benge <[EMAIL PROTECTED]> > Thanks Pablo. > > I'll be flying to New Orleans tomorrow for ApacheCon and would love > the opportuni

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-11-04 Thread Todd Benge
Thanks Pablo. I'll be flying to New Orleans tomorrow for ApacheCon and would love the opportunity to talk with others about architectures others are using. Todd On 11/4/08, PabloS <[EMAIL PROTECTED]> wrote: > > Sure Todd, > > the idea basically consist in the following: > > - Subclassing FIeld

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-11-04 Thread PabloS
Sure Todd, the idea basically consist in the following: - Subclassing FIeldSortedHitQueue and calling support with an empty SortField array: this disables caching because the comparators are retrieved during construction - Creating a new SortComparatorSource that creates the sort comparators sim

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-11-03 Thread Todd Benge
Pablo, Would you mind adding a little more detail about how you're working around the problem? I'm still evaluating our different options so am interested in what you did. Todd On Mon, Nov 3, 2008 at 2:37 PM, PabloS <[EMAIL PROTECTED]> wrote: > > Thanks hossman, but I've already 'solved' the pr

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-11-03 Thread PabloS
Thanks hossman, but I've already 'solved' the problem without the need to patch lucene. I had to code a bit around Lucene's visibility restrictions but I've managed to completely skip the field caching mechanism and add ehcache to it. At the moment it seems to be working quite well, although not

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-11-03 Thread Chris Hostetter
: I'm having a similar problem with my application, although we are using : lucene 2.3.2. The problem we have is that we are required to sort on most of : the fields (20 at least). Is there any way of changing the cache being used? there is a patch in Jira that takes a completley different approa

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-31 Thread PabloS
Thanks for the quick reply :). For now, I'd settle with just storing cache values in soft references so at least the GC would be able to free up some space when it needs to. I think I'll just try to override the default sorting mechanism by subclassing FieldSortedHitQueue. I'll let you know how i

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-31 Thread Mark Miller
20 fields on a huge index? Wow - not sure there is a ton you can do with that...anyone have any suggestions for that one? Distributed should help I suppose, but thats a lot of sort fields for a large index. If LUCENE-831 ever gets off the ground you will be able to change the cache used, and p

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-31 Thread PabloS
Hi, I'm having a similar problem with my application, although we are using lucene 2.3.2. The problem we have is that we are required to sort on most of the fields (20 at least). Is there any way of changing the cache being used? I can't seem to find a way, since the cache is being accessed using

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-30 Thread Todd Benge
rs, > Mark > > > > > > - Original Message > From: Mark Miller <[EMAIL PROTECTED]> > To: "java-user@lucene.apache.org" > Sent: Thursday, 30 October, 2008 10:37:48 > Subject: Re: OutOfMemory Problems Lucene 2.4 / Tomcat > > Michaels got

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-30 Thread mark harwood
to write a more optimized custom field cache then the above code may be a useful start point. Cheers, Mark - Original Message From: Mark Miller <[EMAIL PROTECTED]> To: "java-user@lucene.apache.org" Sent: Thursday, 30 October, 2008 10:37:48 Subject: Re: OutOfMemory Pr

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-30 Thread Mark Miller
Michaels got some great points (he the lucene master), especially possibly turning off norms if you can, but for an index like that i'd reccomwnd solr. Solr sharding can be scaled to billions (min a billion or two anyway) with few limitations (of course there are a few). Plus it has further

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-30 Thread Michael McCandless
The terms index (*.tii), which is loaded entirely into RAM, can consume an unexpectedly large amount of memory when there are an unusually high number of terms. If you are not using compound file format, can you look at the size of *.tii? If this is what is affecting you, one simple wor

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-29 Thread Todd Benge
Thanks Mark. I appreciate the help. I thought our memory may be low but wanted to verify there if there is any way to control memory usage. I think we'll likely upgrade the memory on the machines but that may just delay the inevitable. Wondering if anyone else has encountered similar issues wit

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-29 Thread Mark Miller
The term, terminfo, indexreader internals stuff is prob on the low end compared to the size of your field caches (needed for sorting). If you are sorting by String I think the space needed is 32 bits x number of docs + an array to hold all of the unique terms. So checking 300 million docs (I kn

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-29 Thread Todd Benge
There's usually only a couple sort fields and a bunch of terms in the various indices. The terms are user entered on various media so the number of terms is very large. Thanks for the help. Todd On 10/29/08, Todd Benge <[EMAIL PROTECTED]> wrote: > Hi, > > I'm the lead engineer for search on a

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-29 Thread Mark Miller
How many fields are you sorting on? Lots of unuiqe terms in those fields? - Mark On Oct 29, 2008, at 6:03 PM, "Todd Benge" <[EMAIL PROTECTED]> wrote: Hi, I'm the lead engineer for search on a large website using lucene for search. We're indexing about 300M documents in ~ 100 indices.