On Wed, Feb 10, 2010 at 06:58:01AM -0500, Michael McCandless wrote: > But why didn't you have the Multi*Enums layer add the offset (so that > the codec need not know who's consuming it)? Performance?
That would have involved something like this within the aggregator: posting.setDocID(posting.getDodID() + docBase). The problem is that that's the docID the SegPostingList is using for its deltas. If the SegPostingList skips during a call to advance(), it needs to reset that docID to the what the skip data says -- but if the aggregator layer doesn't tell it that it needs to account for a docBase, the new docID will lose the offset. Can't solve that problem at the aggregator level either -- the aggregator doesn't know when skipping is occurring, so it can't intervene on an as-needed basis. The fix was to make SegPostingList aware of a docBase, so that on skipping it could add it to the docID in the skip data and land at the right docID from the perspective of the consumer. Messy. I suppose another possibility would have been to have the aggregator keep its own Posting and copy all data over from the SegPostingList's Posting on each iteration then add its offset. However, that would have been a lot less efficient, and it still wouldn't have worked for the "flat positions space" example because the generic aggregator would not have known about the needs of the specific codec. > > That example may not be a deal breaker for you, but I'm not willing > > to guarantee that Lucy will always return primitives from these > > enums, now and forever, one per method call. > > But it'd be a major API change down the road to change this, for > Lucy/KS? I suppose so. It's either foreclose on the possibility of aggregating (Lucy), or foreclose on the possibility of using properties that cannot be aggregated (Lucene). > Also, this is why we're adding Attribute* to all the postings enums, > with flex -- any codec & consumer can use their own private > attributes. The attrs pass through Multi*Enum. Hmm. Does that mean that the consumer needs to refresh the attributes with each iteration? Because what happens when you switch sub-enums within the Multi*Enum? Don't those attributes go stale, as they belong to a sub-enum that has finished? Marvin Humphrey --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org