Re: Grouping on multiple shards possible in lucene?

Ravikumar Govindarajan Wed, 21 Nov 2012 05:32:49 -0800

Yeah, but IndexSorter is offline. I need an online sorter. The trouble is
as Mike pointed out, the delta encodings are forward only. I do not know of
an available encoding to do this.


--
Ravi

On Wed, Nov 21, 2012 at 3:26 PM, Shai Erera <ser...@gmail.com> wrote:

> If you are only interested in doc addition sorting, then it should be easy
> to reverse the doc orders in each segment, using something like
> IndexSorter.
>
> Shai
>
> On Wed, Nov 21, 2012 at 8:03 AM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
> > Hi Shai,
> >
> > I would only want to sort based on doc additions. Ex: d1,d2,d3. Then true
> > sort order means d3,d2,d1. Doc timestamp based solution is much more
> > involved like you said
> >
> > It's nice to know that you are already working on it and there will be a
> > solution in the near future.
> >
> > In the meantime, I will live with good old sorting
> >
> > --
> > Ravi
> >
> > On Wed, Nov 21, 2012 at 1:59 AM, Shai Erera <ser...@gmail.com> wrote:
> >
> > > Hi Ravi,
> > >
> > > I've been dealing with reverse indexing lately, so let me share with
> you
> > a
> > > bit of my experience thus far.
> > >
> > > First, you need to define what does reverse indexing mean for you. If
> it
> > > means that docs that were indexed in the following order: d1, d2, d3
> > should
> > > be traversed during search in that order: d3, d2, d1 - then that's one
> > > thing.
> > > However, if it means that the traversal needs to occur by e.g. the
> > > documents' timestamp, as a means to process documents from latest to
> > > oldest, then that's a totally different thing, and way more
> complicated.
> > >
> > > You will need to think about an IndexReader which reverses the order of
> > the
> > > segments that it reads, so that segments are processed from latest to
> > > oldest. Also, you might need to merge the segments in reverse order too
> > > (i.e. if segments s1, s4, s5 are merged, merge them as s5, s4, s1).
> > >
> > > If you are interested in timestamp based sorting, it gets complicated.
> > > Documents flow in from multiple producers (e.g. a parallel crawler,
> > > different processes which feed documents to the index et.c) and
> processed
> > > usually by multiple consumers (indexing threads). That makes sorting
> the
> > > index based on a timestamp difficult.
> > >
> > > Lucene used to have IndexSorter (before 4.0) which could sort an index
> > by a
> > > field. That was an offline process and if that's what you're after --
> you
> > > should do just that and forget about the rest. If however you're
> > interested
> > > in an on-line process, where documents are fed in some order and
> searched
> > > in the exact true order (latest to oldest), that's a more complicated
> > > solution -- I'm still working on it :).
> > >
> > > HTH
> > >
> > > Shai
> > >
> > > On Tue, Nov 20, 2012 at 5:37 PM, Ravikumar Govindarajan <
> > > ravikumar.govindara...@gmail.com> wrote:
> > >
> > > > But, I think it should be possible with some fun codec & merge policy
> > > > & MultiReader magic, to have docIDs assigned in "reverse
> chronological
> > > > order"
> > > >
> > > > Can you explain it a bit more? I was thinking perhaps we store
> absolute
> > > > doc-ids instead of delta to do reverse traversal. But this could
> waste
> > a
> > > > lot of storage
> > > >
> > > > The default merge policy will merge adjacent segments no? Is it going
> > to
> > > > disturb the ordering?
> > > >
> > > > --
> > > > Ravi
> > > >
> > > > On Tue, Nov 20, 2012 at 5:19 PM, Michael McCandless <
> > > > luc...@mikemccandless.com> wrote:
> > > >
> > > > > On Tue, Nov 20, 2012 at 1:49 AM, Ravikumar Govindarajan
> > > > > <ravikumar.govindara...@gmail.com> wrote:
> > > > > > Thanks Mike. Actually, I think I can eliminate sort-by-time, if I
> > am
> > > > able
> > > > > > to iterate postings in reverse doc-id order. Is this possible in
> > > > lucene?
> > > > >
> > > > > Alas that is not easy to do in Lucene: the posting lists are
> encoded
> > > > > in forward docID order.
> > > > >
> > > > > But, I think it should be possible with some fun codec & merge
> policy
> > > > > & MultiReader magic, to have docIDs assigned in "reverse
> > chronological
> > > > > order" ...
> > > > >
> > > > > > Also, for a TopN query sorted by doc-id will the query terminate
> > > early?
> > > > >
> > > > > Actually, it won't!  But it really should ... you could make a
> > > > > Collector that throws an exception once the N docs have been
> > > > > collected?
> > > > >
> > > > > Mike McCandless
> > > > >
> > > > > http://blog.mikemccandless.com
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Grouping on multiple shards possible in lucene?

Reply via email to