Thanks Mike. Actually, I think I can eliminate sort-by-time, if I am able to iterate postings in reverse doc-id order. Is this possible in lucene? Also, for a TopN query sorted by doc-id will the query terminate early?
-- Ravi On Fri, Nov 16, 2012 at 9:40 PM, Michael McCandless < luc...@mikemccandless.com> wrote > Yes, this is possible using Lucene's grouping APIs. > > It looks like index time grouping won't work, since you get the same > parent spread out across time, but you can use the two-pass grouping > instead ... run the FirstPassGroupingCollector on each shard, get the > top groups from each, merge those and pick the top N groups, run > SecondPassGroupingCollector to get TopGroups from each shard, and then > use TopGroups.merge to merge the results. > > Lucene provides the APIs to do this ... but it's up to you to send > requests out to other shards, gather the results, call the merge, etc. > > Mike McCandless > > http://blog.mikemccandless.com > > On Fri, Nov 16, 2012 at 9:43 AM, Ravikumar Govindarajan > <ravikumar.govindara...@gmail.com> wrote: > > The formatter has wrecked the table... Reposting it > > > > Please read it as follows > > > > {ENTITY,PARENT,DATE,SHARD} tuple > > > > M1 C1 12/11/2010 A1 > > M2 C2 12/11/2011 A2 > > M3 C4 12/02/2012 A3 > > M4 C1 12/11/2012 A4 > > M5 C2 13/11/2012 A4 > > M6 C3 14/11/2012 A4 > > > > I need to group this based on parents ordered by time. The shards > > themselves are in increasing order of time {A1-A4 in ascending order of > > time} > > > > So, if for some search, the entities matched are M1,M2,M3,M4&M6, the set > of > > results returned should be *C3,C2,C1,C4* > > > > I am aware of grouping search in lucene, but extending it to multiple > > shards is possible? More importantly, are there ways by which I can > > re-organize my Documents during index-time to optimize query performance > > for such a grouping feature? > > > > -- > > Ravi > > > > > > On Fri, Nov 16, 2012 at 8:05 PM, Ravikumar Govindarajan < > > ravikumar.govindara...@gmail.com> wrote: > > > >> We are trying to do a grouping search that spans multiple shards ordered > >> by time. > >> > >> > >> *ENTITY PARENT > >> TIME SHARD* > >> M1 C1 > >> 12-Nov-2010 A1 > >> M2 C2 > >> 12-Nov-2011 A2 > >> M3 C4 > >> 12-Feb-2012 A3 > >> M4 C1 > >> 12-Nov-2012 A4 > >> M5 C2 > >> 13-Nov-2012 A4 > >> M6 C3 > >> 14-Nov-2012 A4 > >> > >> I need to group this based on parents ordered by time. The shards > >> themselves are in increasing order of time {A1-A4 in ascending order of > >> time} > >> > >> So, if for some search, the entities matched are M1,M2,M3,M4&M6, the set > >> of results returned should be *C3,C2,C1,C4* > >> > >> I am aware of grouping search in lucene, but extending it to multiple > >> shards is possible? More importantly, are there ways by which I can > >> re-organize my Documents during index-time to optimize query performance > >> for such a grouping feature? > >> > >> -- > >> Ravi > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >