I think BlockTree should be better (less disk space, RAM and faster
lookups), but if you can make a benchmark comparing the two that would
help confirm/deny!
Mike McCandless
http://blog.mikemccandless.com
On Mon, Nov 18, 2013 at 9:21 PM, Ravikumar Govindarajan
wrote:
> Many thanks Mike.
>
> In
Many thanks Mike.
In a given document, I usually have 15-20 fields out of which 6-7 fields
are plain key-value fields.
Typically these key-value fields don't involve prefixes, reflexes, fuzzies
etc...It's always a full match. Non-existent values are also not possible
during search.
In such a cas
Yes, BlockTreeTermsWriter uses freezeTail to figure out where to draw
the lines for assigning terms to blocks, but to build the trie terms
index it builds a separate FST, by adding in each block's prefix (it
doesn't use the FST's builder pruning to create the trie).
Mike McCandless
http://blog.mi
Yeah, now I kind of understood.
Is this why BlockTreeTermsWriter plugs in it's freezeTail logic of meeting
min-nbr of terms per block and building a trie for locating sub-blocks?
--
Ravi
On Fri, Nov 15, 2013 at 11:17 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> When you turn on
When you turn on pruning, FST Builder will just remove nodes that
don't have a high enough count of input terms traversing through them.
E.g. if minSuffixCount1 is 100 then only FST nodes that see >= 100
input terms coming through them, are preserved.
You can use this to build a prefix trie inste
I was trying to understand some logic in Builder class of FST.
The method freezeTail() looks quite hairy. I gather that there is an some
logic for pruning a node or compiling it.
What exactly is pruning a node? An example of it will be really really
helpful
--
Ravi