Yeah that's fine, however if there isn't a Java implementation that's a lot
of extra work.  The FST approach should be a clear quick and easy win.  The
current system of in heap keys and a binary search is what the FST replaced
in Lucene.  There are plenty of references to the improvement.

On Fri, Jun 8, 2012 at 3:50 PM, Pavel Yaskevich <pove...@gmail.com> wrote:

> I would vote,  if possible, to compare it with y-fast trie [1] (it doesn't
> seem to be available java implementation unfortunately) by means of memory
> efficiency and lookup performance. As we use big integer tokens the main
> benefit from that trie could be O(log log M) predecessor lookup and compact
> in-memory size.
>
> [1] https://en.wikipedia.org/wiki/Y-fast_trie
>
> Best Regards
> --
> Pavel Yaskevich
>
>
> On Friday 8 June 2012 at 22:19, Jason Rutherglen wrote:
>
> > Ok looks like the IndexSummary encapsulates everything, I can start with
> > hacking that.
> >
> > On Fri, Jun 8, 2012 at 11:50 AM, Jason Rutherglen <
> > jason.rutherg...@gmail.com (mailto:jason.rutherg...@gmail.com)> wrote:
> >
> > > The Cassandra integration is probably beyond the time I have available.
> > > If the locations in the code that need to be rewritten to use the FST
> are
> > > known, and a patch simply 'plugs-in' the FST, that would be much
> easier.
> > > Eg, I don't know how Cassandra stores the current key index for
> example...
> > >
> > > Basically I can write FST serializing, deserializing, and key lookup
> code
> > > fairly easy by basing it on Lucene's terms dict.
> > >
> > >
> > > On Fri, Jun 8, 2012 at 6:00 AM, Radim Kolar <h...@filez.com (mailto:
> h...@filez.com)> wrote:
> > >
> > > >
> > > > If you are interested I can help, I used the FST on a Hadoop project
> > > > > to implement a fast map side range join.
> > > >
> > > > create JIRA item with patch attached, i will test it.
> > > >
> > >
> > >
> >
> >
> >
>
>
>

Reply via email to