Re: Anticipating a benchmark for direct posting format

[email protected] Mon, 07 Apr 2014 09:33:06 -0700

Aaaah, nice idea to simply use FilterAtomicReader -- of course!  So this
would ultimately be a new IndexReaderFactory that creates
FilterAtomicReaders for a subset of the fields you want to do this on.
 Cool!  With that, I don't think there would be a need for
DirectPostingsFormat as a postings format, would there be?


~ David


On Mon, Apr 7, 2014 at 10:58 AM, Shai Erera <[email protected]> wrote:

> The only problem is how the Codec makes a dynamic decision on whether to
> use the wrapped Codec for reading vs pre-load data into in-memory
> structures, because Codecs are loaded through reflection by the SPI loading
> mechanism.
>
> There is also a TODO in DirectPF to allow wrapping arbitrary PFs, just
> mentioning in case you want to tackle DPF.
>
> I think that if we allowed passing something like a CodecLookupService,
> with an SPILookupService default impl, you could easily pass that to
> DirectoryReader which will use your runtime logic to load the right PF
> (e.g. DPF) instead of the one the index was created with.
>
> But it sounds like the core problem is that when we load a Codec/PF/DVF
> for reading, we cannot pass it any arguments, and so we must make an
> index-time decision about how we're going to read the data later on. If we
> could somehow support that, I think that will help you to achieve what you
> want too.
>
> E.g. currently it's an all-or-nothing decision, but if we could pass a
> parameter like "50% available heap", the Codec/PF/DVF could cache the
> frequently accessed postings instead of loading all of them into memory.
> But, that can also be achieved at the IndexReader level, through a custom
> FilterAtomicReader. And if you could reuse DPF's structures (like
> DirectTermsEnum, DirectFields...), it should be easier to do this. So
> perhaps we can think about a DirectAtomicReader which does that? I believe
> it can share some code w/ DPF, as long as we don't make these APIs public,
> or make them @super.experimental and @super.expert.
>
> Just throwing some ideas...
>
> Shai
>
>
> On Mon, Apr 7, 2014 at 5:35 PM, [email protected] <
> [email protected]> wrote:
>
>> Benson, I like your idea.
>>
>> I think your idea can be achieved as a codec, one that wraps another
>> codec that establishes the on-disk format.  By default the wrapped codec
>> can be Lucene's default codec.  I think, if implemented, this would be a
>> change to DPF instead of an additional DPF-variant codec.
>>
>> ~ David
>>
>>
>> On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies 
>> <[email protected]>wrote:
>>
>>> On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir <[email protected]> wrote:
>>> > On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies <
>>> [email protected]> wrote:
>>> >
>>> >>
>>> >> My takeaway from the prior conversation was that various people didn't
>>> >> entirely believe that I'd seen a dramatic improvement in query perfo
>>> >> using D-P-F, and so would not smile upon a patch intended to liberate
>>> >> D-P-F from codecs. It could be that the effect I saw has to do with
>>> >> the fact that our system depends on hitting and scoring 50% of the
>>> >> documents in an index with a lot of documents.
>>> >>
>>> >
>>> > I dont understand the word "liberate" here. why is it such a problem
>>> > that this is a codec?
>>>
>>>  I don't want to have to declare my intentions at the time I create
>>> the index. I don't want to have to use D-P-F for all readers all the
>>> time. Because I want to be able to decide to open up an index with an
>>> arbitrary on-disk format and get the in-memory cache behavior of
>>> D-P-F. Thus 'liberate' -- split the question of 'keep a copy in
>>> memory' from the choice of the on-disk format.
>>>
>>>
>>> >
>>> > i do not think we should give it any more status than that, it wastes
>>> > too much ram.
>>>
>>> It didn't seem like 'waste' when it solved a big practical for us. We
>>> had an application that was too slow, and had plenty of RAM available,
>>> and we were able to trade space for time by applying D-P-F.
>>>
>>> Maybe I'm going about this backwards; if I can come up with a small,
>>> inconspicuous proposed change that does what I want, there won't be
>>> any disagreement.
>>>
>>>
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: [email protected]
>>> > For additional commands, e-mail: [email protected]
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>

Re: Anticipating a benchmark for direct posting format

Reply via email to