On Tue, Oct 8, 2013 at 5:50 PM, Michael McCandless < luc...@mikemccandless.com> wrote:
> DirectPostingsFormat? > > It stores all terms + postings as simple java arrays, uncompressed. > This definitely speeded things up in my benchmark, but I'm greedy for more. I just made a codec that returns it as the postings guy, is that the whole recipe?. Does it make sense to extend it any further to any of the other codec pieces? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Oct 8, 2013 at 5:45 PM, Benson Margulies <ben...@basistech.com> > wrote: > > Consider a Lucene index consisting of 10m documents with a total disk > > footprint of 3G. Consider an application that treats this index as > > read-only, and runs very complex queries over it. Queries with many > terms, > > some of them 'fuzzy' and 'should' terms and a dismax. And, finally, > > consider doing all this on a box with over 100G of physical memory, some > > cores, and nothing else to do with its time. > > > > I should probably just stop here and see what thoughts come back, but > I'll > > go out on a limb and type the word 'codec'. The MMapDirectory, of course, > > cheerfully gets to keep every single bit in memory. And then each query > > runs, exercising the the codec, building up a flurry of Java objects, > all > > of which turn into garbage and we start all over. So, I find myself > > wondering, is there some sort of an opportunity for a codec-that-caches > in > > here? In other words, I'd like to sell some of my space to buy some time. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >