On Mon, Jan 27, 2014 at 7:12 AM, Michael McCandless <[email protected]> wrote: > Hi Benson, > > I use the code from luceneutil > (https://code.google.com/a/apache-extras.org/p/luceneutil/ ), e.g. I > run those scripts nightly for the nightly benchmarks: > http://people.apache.org/~mikemccand/lucenebench > > But, that's the Wikipedia corpus, and has no "real" queries, and the > scripts are quite challenging to get working ... if you have access to > more "realistic" corpus + queries, even if you can't share it, those > results are also interesting to share. > > I think it would be neat if an app could retroactively pick DirectPF > at search time, or more generally pass search-time parameters when > initializing codec components (I think there was a discussion about > this at some point but I can't remember what the use case was). > Today, any and all choices must be written into the index and cannot > be changed at search time, which is somewhat silly/restrictive for > DirectPF since it can wrap any other PF and act as simply a fast > "cache" on top of the postings.
Well, that's where I thought I was starting: an API into the reader that allows DirectPF to be injected as a wrapper around others. I haven't had time to follow Rob's bread-crumb trail to see if this is straightforward by customizing Directory -- thought it occurs to me that we have many directories, and it would useful to be able to do this regardless. I may be able to share a data set, I'll check into that today. > > > Mike McCandless > > http://blog.mikemccandless.com > > > On Mon, Jan 27, 2014 at 7:06 AM, Benson Margulies <[email protected]> > wrote: >> What do we have for a benchmark framework that is used to >> justify/qualify speed-related things? One way forward would be to see >> what a quantified measurement shows from the idea I have in mind, and >> use that to facilitate deciding if this belongs in the tree. >> >> On Sat, Jan 25, 2014 at 6:34 PM, Benson Margulies <[email protected]> >> wrote: >>> Keeping things in memory and not re-reading them from disk is what >>> really sang the song for us. Even if the initial read-in was more >>> costly due to decompression, the long-term amortized benefit of not >>> re-reading would still be a big winner. >>> >>> >>> On Sat, Jan 25, 2014 at 5:37 PM, Robert Muir <[email protected]> wrote: >>>> well the Directory layer likely isnt what probably makes DirectPF faster >>>> for >>>> you. Its probably the fact it does no compression at all... >>>> >>>> >>>> On Sat, Jan 25, 2014 at 5:34 PM, Benson Margulies <[email protected]> >>>> wrote: >>>>> >>>>> On Sat, Jan 25, 2014 at 5:09 PM, Robert Muir <[email protected]> wrote: >>>>> > That would be Directory :) >>>>> >>>>> Oh, how embarrassing. I could have written a custom directory to begin >>>>> with. >>>>> >>>>> Would a Directory class for this purpose be an interesting patch, in >>>>> that case? I'm not discontented about building a Directory into our >>>>> application, but it seems like I might not be the only person to find >>>>> this useful. >>>>> >>>>> > >>>>> > >>>>> > On Sat, Jan 25, 2014 at 5:03 PM, Benson Margulies >>>>> > <[email protected]> >>>>> > wrote: >>>>> >> >>>>> >> I've had very gratifying results using the DirectPostingFormat to >>>>> >> speed up queries when I had a read-only index with plenty of memory. >>>>> >> The only downside was the need to specify it within the Codec, and >>>>> >> thus write it into the index. >>>>> >> >>>>> >> Ever since, I've wondered if we could change things to introduce the >>>>> >> same goodness without building it into the codec. >>>>> >> >>>>> >> Very roughly, I'm imagining an option in the IndexReader to provide an >>>>> >> object that can surround the codec that is called for in the stored >>>>> >> format. >>>>> >> >>>>> >> Is this an old question? Is it worth sketching a patch? >>>>> >> >>>>> >> --------------------------------------------------------------------- >>>>> >> To unsubscribe, e-mail: [email protected] >>>>> >> For additional commands, e-mail: [email protected] >>>>> >> >>>>> > >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
