The format is unfortunately rather intricate ... FST = finite state transducer (see eg http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html ). We use that to hold the terms index (*.tip), which is loaded into RAM.
The blocks are because we encode a block of between 25 - 48 terms together. Blocks are picked according to how terms share prefixes so that we get better compression and faster loookup. It's a variant of a burst trie (see eg http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499 ). The index points to the start of blocks, so in looking up a term we figure out from the index which block may have the term (if any), seek there, and scan for it. Mike McCandless http://blog.mikemccandless.com On Fri, Nov 16, 2012 at 3:57 AM, wgggfiy <wuqiu....@qq.com> wrote: > Hi, guys.I'm now studying lucene 4.0, and come into difficulties.Compared > previous version, the term dictionary is not like this version.what is block > ? and what is the FST ?help me, thx. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/what-is-the-format-of-tim-and-tiq-in-lucene-4-0-tp4020677.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org