Re: FST Builder pruning

2013-11-19 Thread Michael McCandless
I think BlockTree should be better (less disk space, RAM and faster lookups), but if you can make a benchmark comparing the two that would help confirm/deny! Mike McCandless http://blog.mikemccandless.com On Mon, Nov 18, 2013 at 9:21 PM, Ravikumar Govindarajan wrote: > Many thanks Mike. > > In

Re: FST Builder pruning

2013-11-18 Thread Ravikumar Govindarajan
Many thanks Mike. In a given document, I usually have 15-20 fields out of which 6-7 fields are plain key-value fields. Typically these key-value fields don't involve prefixes, reflexes, fuzzies etc...It's always a full match. Non-existent values are also not possible during search. In such a cas

Re: FST Builder pruning

2013-11-17 Thread Michael McCandless
Yes, BlockTreeTermsWriter uses freezeTail to figure out where to draw the lines for assigning terms to blocks, but to build the trie terms index it builds a separate FST, by adding in each block's prefix (it doesn't use the FST's builder pruning to create the trie). Mike McCandless http://blog.mi

Re: FST Builder pruning

2013-11-15 Thread Ravikumar Govindarajan
Yeah, now I kind of understood. Is this why BlockTreeTermsWriter plugs in it's freezeTail logic of meeting min-nbr of terms per block and building a trie for locating sub-blocks? -- Ravi On Fri, Nov 15, 2013 at 11:17 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > When you turn on

Re: FST Builder pruning

2013-11-15 Thread Michael McCandless
When you turn on pruning, FST Builder will just remove nodes that don't have a high enough count of input terms traversing through them. E.g. if minSuffixCount1 is 100 then only FST nodes that see >= 100 input terms coming through them, are preserved. You can use this to build a prefix trie inste

FST Builder pruning

2013-11-15 Thread Ravikumar Govindarajan
I was trying to understand some logic in Builder class of FST. The method freezeTail() looks quite hairy. I gather that there is an some logic for pruning a node or compiling it. What exactly is pruning a node? An example of it will be really really helpful -- Ravi