Re: Exploiting a whole lot of memory

2013-10-08 Thread Benson Margulies
Oh, drat, I left out an 's'. I got it now. On Tue, Oct 8, 2013 at 7:40 PM, Benson Margulies wrote: > Mike, where do I find DirectPostingFormat? > > > On Tue, Oct 8, 2013 at 5:50 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> DirectPostingsFormat? >> >> It stores all terms + po

Re: Exploiting a whole lot of memory

2013-10-08 Thread Benson Margulies
Mike, where do I find DirectPostingFormat? On Tue, Oct 8, 2013 at 5:50 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > DirectPostingsFormat? > > It stores all terms + postings as simple java arrays, uncompressed. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, O

Re: Exploiting a whole lot of memory

2013-10-08 Thread Michael McCandless
DirectPostingsFormat? It stores all terms + postings as simple java arrays, uncompressed. Mike McCandless http://blog.mikemccandless.com On Tue, Oct 8, 2013 at 5:45 PM, Benson Margulies wrote: > Consider a Lucene index consisting of 10m documents with a total disk > footprint of 3G. Consider

Exploiting a whole lot of memory

2013-10-08 Thread Benson Margulies
Consider a Lucene index consisting of 10m documents with a total disk footprint of 3G. Consider an application that treats this index as read-only, and runs very complex queries over it. Queries with many terms, some of them 'fuzzy' and 'should' terms and a dismax. And, finally, consider doing all

Re: Analyzer classes versus the constituent components

2013-10-08 Thread Michael Sokolov
There are some Analyzer methods you might want to override (initReader for inserting a CharFilter, stuff about gaps), but if you don't need that, it seems to be mostly about packaging neatly, as you say. -Mike On 10/8/13 10:30 AM, Benson Margulies wrote: Is there some advice around about when

Re: Equivalent LatLongDistanceFilter in Lucene 4.4 API

2013-10-08 Thread David Smiley (@MITRE.org)
Hi James, The spatial module in v4 is completely different than the one in v3. It would be good for you to review the new API rather then looking for a 1-1 equivalent to a class that existed in v3. Take a look at the top level javadocs for the spatial module, and in particular look at SpatialExa

Analyzer classes versus the constituent components

2013-10-08 Thread Benson Margulies
Is there some advice around about when it's appropriate to create an Analyzer class, as opposed to just Tokenizer and TokenFilter classes? The advantage of the constituent elements is that they allow the consuming application to add more filters. The only disadvantage I see is that the following i

Re: Lucene 4.4.0 mergeSegments OutOfMemoryError

2013-10-08 Thread Michael McCandless
When you open this index for searching, how much heap do you give it? In general, you should give IndexWriter the same heap size, since during merge it will need to open N readers at once, and if you have RAM resident doc values fields, those need enough heap space. Also, the default DocValuesForm

Re: optimal way to access many TermVectors

2013-10-08 Thread Adrien Grand
Hi, On Mon, Oct 7, 2013 at 9:31 PM, Rose, Stuart J wrote: > Is there an optimal way to access many document TermVectors (in the same > chunk) consecutively when using the LZ4 termvector compression? > > I'm curious to know whether all TermVectors in a single compressed chunk are > decompressed