On Tue, Feb 09, 2010 at 03:47:19PM -0500, Michael McCandless wrote: > Interesting... and segment merging just does its own private > concatenation/mapping-around-deletes of the doc/positions?
I think the answer is yes, but I'm not sure I understand the question completely since I'm not sure why you'd ask that in this context. > what's a "flat positions space"? It's something Google once used. Instead of positions starting with 0 at each document, they just keep going. doc 1: "Three Blind Mice" - positions 0, 1, 2 doc 2: "Peter Peter Pumpkin Eater" - positions 3, 4, 5, 6 > And we don't return "objects or aggregates" with Multi*Enum now... Yeah, this is different. In KS right now, we use a generic PostingList, which conveys different information depending on what class of Posting it contains. > In flex right now the codec is unware that it's being "consumed" by a > Multi*Enum. Right, but in KinoSearch's case PostingList had to be aware of that because the Posting object could be consumed at either the segment level or the index level -- so it needed a setDocBase(offset) method which adjusted the doc num in the Posting. It was messy. The change I made was to eliminate PolyPostingList and PolyPostingListReader, which made it possible to remove the setDocBase() method from SegPostingList. > It still returns primitives. If instead we returned an int[] for positions > (hmm -- may be a good reason to make positions be an Attribute, Uwe), I > think it would still be OK? In the flat positions space example, it would be necessary to add an offset to each of the positions in that array. Each segment would have a "positions max" analogous to maxDoc(); these would be summed to obtain the positions offset the same way we add up maxDoc() now to obtain the doc id offset. That example may not be a deal breaker for you, but I'm not willing to guarantee that Lucy will always return primitives from these enums, now and forever, one per method call. > Still torn... I think it's convenience vs performance. But convenience for the posting format plugin developer matters too, right? Are you confident that a generic aggregator can support all possible codecs, or will plugin developers be forced to ensure that aggregation works because you've guaranteed to users like Renaud that it will? Marvin Humphrey --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org