Yeah, but IndexSorter is offline. I need an online sorter. The trouble is as Mike pointed out, the delta encodings are forward only. I do not know of an available encoding to do this.
-- Ravi On Wed, Nov 21, 2012 at 3:26 PM, Shai Erera <ser...@gmail.com> wrote: > If you are only interested in doc addition sorting, then it should be easy > to reverse the doc orders in each segment, using something like > IndexSorter. > > Shai > > On Wed, Nov 21, 2012 at 8:03 AM, Ravikumar Govindarajan < > ravikumar.govindara...@gmail.com> wrote: > > > Hi Shai, > > > > I would only want to sort based on doc additions. Ex: d1,d2,d3. Then true > > sort order means d3,d2,d1. Doc timestamp based solution is much more > > involved like you said > > > > It's nice to know that you are already working on it and there will be a > > solution in the near future. > > > > In the meantime, I will live with good old sorting > > > > -- > > Ravi > > > > On Wed, Nov 21, 2012 at 1:59 AM, Shai Erera <ser...@gmail.com> wrote: > > > > > Hi Ravi, > > > > > > I've been dealing with reverse indexing lately, so let me share with > you > > a > > > bit of my experience thus far. > > > > > > First, you need to define what does reverse indexing mean for you. If > it > > > means that docs that were indexed in the following order: d1, d2, d3 > > should > > > be traversed during search in that order: d3, d2, d1 - then that's one > > > thing. > > > However, if it means that the traversal needs to occur by e.g. the > > > documents' timestamp, as a means to process documents from latest to > > > oldest, then that's a totally different thing, and way more > complicated. > > > > > > You will need to think about an IndexReader which reverses the order of > > the > > > segments that it reads, so that segments are processed from latest to > > > oldest. Also, you might need to merge the segments in reverse order too > > > (i.e. if segments s1, s4, s5 are merged, merge them as s5, s4, s1). > > > > > > If you are interested in timestamp based sorting, it gets complicated. > > > Documents flow in from multiple producers (e.g. a parallel crawler, > > > different processes which feed documents to the index et.c) and > processed > > > usually by multiple consumers (indexing threads). That makes sorting > the > > > index based on a timestamp difficult. > > > > > > Lucene used to have IndexSorter (before 4.0) which could sort an index > > by a > > > field. That was an offline process and if that's what you're after -- > you > > > should do just that and forget about the rest. If however you're > > interested > > > in an on-line process, where documents are fed in some order and > searched > > > in the exact true order (latest to oldest), that's a more complicated > > > solution -- I'm still working on it :). > > > > > > HTH > > > > > > Shai > > > > > > On Tue, Nov 20, 2012 at 5:37 PM, Ravikumar Govindarajan < > > > ravikumar.govindara...@gmail.com> wrote: > > > > > > > But, I think it should be possible with some fun codec & merge policy > > > > & MultiReader magic, to have docIDs assigned in "reverse > chronological > > > > order" > > > > > > > > Can you explain it a bit more? I was thinking perhaps we store > absolute > > > > doc-ids instead of delta to do reverse traversal. But this could > waste > > a > > > > lot of storage > > > > > > > > The default merge policy will merge adjacent segments no? Is it going > > to > > > > disturb the ordering? > > > > > > > > -- > > > > Ravi > > > > > > > > On Tue, Nov 20, 2012 at 5:19 PM, Michael McCandless < > > > > luc...@mikemccandless.com> wrote: > > > > > > > > > On Tue, Nov 20, 2012 at 1:49 AM, Ravikumar Govindarajan > > > > > <ravikumar.govindara...@gmail.com> wrote: > > > > > > Thanks Mike. Actually, I think I can eliminate sort-by-time, if I > > am > > > > able > > > > > > to iterate postings in reverse doc-id order. Is this possible in > > > > lucene? > > > > > > > > > > Alas that is not easy to do in Lucene: the posting lists are > encoded > > > > > in forward docID order. > > > > > > > > > > But, I think it should be possible with some fun codec & merge > policy > > > > > & MultiReader magic, to have docIDs assigned in "reverse > > chronological > > > > > order" ... > > > > > > > > > > > Also, for a TopN query sorted by doc-id will the query terminate > > > early? > > > > > > > > > > Actually, it won't! But it really should ... you could make a > > > > > Collector that throws an exception once the N docs have been > > > > > collected? > > > > > > > > > > Mike McCandless > > > > > > > > > > http://blog.mikemccandless.com > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > > > > >