Aaaah, nice idea to simply use FilterAtomicReader -- of course! So this would ultimately be a new IndexReaderFactory that creates FilterAtomicReaders for a subset of the fields you want to do this on. Cool! With that, I don't think there would be a need for DirectPostingsFormat as a postings format, would there be?
~ David On Mon, Apr 7, 2014 at 10:58 AM, Shai Erera <[email protected]> wrote: > The only problem is how the Codec makes a dynamic decision on whether to > use the wrapped Codec for reading vs pre-load data into in-memory > structures, because Codecs are loaded through reflection by the SPI loading > mechanism. > > There is also a TODO in DirectPF to allow wrapping arbitrary PFs, just > mentioning in case you want to tackle DPF. > > I think that if we allowed passing something like a CodecLookupService, > with an SPILookupService default impl, you could easily pass that to > DirectoryReader which will use your runtime logic to load the right PF > (e.g. DPF) instead of the one the index was created with. > > But it sounds like the core problem is that when we load a Codec/PF/DVF > for reading, we cannot pass it any arguments, and so we must make an > index-time decision about how we're going to read the data later on. If we > could somehow support that, I think that will help you to achieve what you > want too. > > E.g. currently it's an all-or-nothing decision, but if we could pass a > parameter like "50% available heap", the Codec/PF/DVF could cache the > frequently accessed postings instead of loading all of them into memory. > But, that can also be achieved at the IndexReader level, through a custom > FilterAtomicReader. And if you could reuse DPF's structures (like > DirectTermsEnum, DirectFields...), it should be easier to do this. So > perhaps we can think about a DirectAtomicReader which does that? I believe > it can share some code w/ DPF, as long as we don't make these APIs public, > or make them @super.experimental and @super.expert. > > Just throwing some ideas... > > Shai > > > On Mon, Apr 7, 2014 at 5:35 PM, [email protected] < > [email protected]> wrote: > >> Benson, I like your idea. >> >> I think your idea can be achieved as a codec, one that wraps another >> codec that establishes the on-disk format. By default the wrapped codec >> can be Lucene's default codec. I think, if implemented, this would be a >> change to DPF instead of an additional DPF-variant codec. >> >> ~ David >> >> >> On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies >> <[email protected]>wrote: >> >>> On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir <[email protected]> wrote: >>> > On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies < >>> [email protected]> wrote: >>> > >>> >> >>> >> My takeaway from the prior conversation was that various people didn't >>> >> entirely believe that I'd seen a dramatic improvement in query perfo >>> >> using D-P-F, and so would not smile upon a patch intended to liberate >>> >> D-P-F from codecs. It could be that the effect I saw has to do with >>> >> the fact that our system depends on hitting and scoring 50% of the >>> >> documents in an index with a lot of documents. >>> >> >>> > >>> > I dont understand the word "liberate" here. why is it such a problem >>> > that this is a codec? >>> >>> I don't want to have to declare my intentions at the time I create >>> the index. I don't want to have to use D-P-F for all readers all the >>> time. Because I want to be able to decide to open up an index with an >>> arbitrary on-disk format and get the in-memory cache behavior of >>> D-P-F. Thus 'liberate' -- split the question of 'keep a copy in >>> memory' from the choice of the on-disk format. >>> >>> >>> > >>> > i do not think we should give it any more status than that, it wastes >>> > too much ram. >>> >>> It didn't seem like 'waste' when it solved a big practical for us. We >>> had an application that was too slow, and had plenty of RAM available, >>> and we were able to trade space for time by applying D-P-F. >>> >>> Maybe I'm going about this backwards; if I can come up with a small, >>> inconspicuous proposed change that does what I want, there won't be >>> any disagreement. >>> >>> >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: [email protected] >>> > For additional commands, e-mail: [email protected] >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >
