Hi Benson,

I use the code from luceneutil
(https://code.google.com/a/apache-extras.org/p/luceneutil/ ), e.g. I
run those scripts nightly for the nightly benchmarks:
http://people.apache.org/~mikemccand/lucenebench

But, that's the Wikipedia corpus, and has no "real" queries, and the
scripts are quite challenging to get working ... if you have access to
more "realistic" corpus + queries, even if you can't share it, those
results are also interesting to share.

I think it would be neat if an app could retroactively pick DirectPF
at search time, or more generally pass search-time parameters when
initializing codec components (I think there was a discussion about
this at some point but I can't remember what the use case was).
Today, any and all choices must be written into the index and cannot
be changed at search time, which is somewhat silly/restrictive for
DirectPF since it can wrap any other PF and act as simply a fast
"cache" on top of the postings.


Mike McCandless

http://blog.mikemccandless.com


On Mon, Jan 27, 2014 at 7:06 AM, Benson Margulies <[email protected]> wrote:
> What do we have for a benchmark framework that is used to
> justify/qualify speed-related things? One way forward would be to see
> what a quantified measurement shows from the idea I have in mind, and
> use that to facilitate deciding if this belongs in the tree.
>
> On Sat, Jan 25, 2014 at 6:34 PM, Benson Margulies <[email protected]> 
> wrote:
>> Keeping things in memory and not re-reading them from disk is what
>> really sang the song for us. Even if the initial read-in was more
>> costly due to decompression, the long-term amortized benefit of not
>> re-reading would still be a big winner.
>>
>>
>> On Sat, Jan 25, 2014 at 5:37 PM, Robert Muir <[email protected]> wrote:
>>> well the Directory layer likely isnt what probably makes DirectPF faster for
>>> you. Its probably the fact it does no compression at all...
>>>
>>>
>>> On Sat, Jan 25, 2014 at 5:34 PM, Benson Margulies <[email protected]>
>>> wrote:
>>>>
>>>> On Sat, Jan 25, 2014 at 5:09 PM, Robert Muir <[email protected]> wrote:
>>>> > That would be Directory :)
>>>>
>>>> Oh,  how embarrassing. I could have written a custom directory to begin
>>>> with.
>>>>
>>>> Would a Directory class for this purpose be an interesting patch, in
>>>> that case? I'm not discontented about building a Directory into our
>>>> application, but it seems like I might not be the only person to find
>>>> this useful.
>>>>
>>>> >
>>>> >
>>>> > On Sat, Jan 25, 2014 at 5:03 PM, Benson Margulies
>>>> > <[email protected]>
>>>> > wrote:
>>>> >>
>>>> >> I've had very gratifying results using the DirectPostingFormat to
>>>> >> speed up queries when I had a read-only index with plenty of memory.
>>>> >> The only downside was the need to specify it within the Codec, and
>>>> >> thus write it into the index.
>>>> >>
>>>> >> Ever since, I've wondered if we could change things to introduce the
>>>> >> same goodness without building it into the codec.
>>>> >>
>>>> >> Very roughly, I'm imagining an option in the IndexReader to provide an
>>>> >> object that can surround the codec that is called for in the stored
>>>> >> format.
>>>> >>
>>>> >> Is this an old question? Is it worth sketching a patch?
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe, e-mail: [email protected]
>>>> >> For additional commands, e-mail: [email protected]
>>>> >>
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to