BlockTreeTermsWriter.TermsWriter.finish writes a FST that serves as an index of the terms dictionary. It will be used at search time when seeking terms in the terms dictionary.
Le lun. 28 mars 2016 à 14:02, Vishwas Jain <vjvis...@gmail.com> a écrit : > Thanks for the reply and information. > I have some doubts regarding the implemenation of lucene54 > codec when writing the posting lists using the lucene50 postinglistwriter > while going through the code. What exactly does the finish() method in the > TermsWriter class of the BlockTreeTermsWriter.java file do? I have come to > undertstand that the posting lists(document ID, frequency, etc) is mainly > is mainly written using WriteBlock method in the ForUtil.java file... > > Thanks.. > > On Mon, Mar 28, 2016 at 5:31 PM, Vishwas Jain <vjvis...@gmail.com> wrote: > > > Thanks for the reply and information. > > I have some doubts regarding the implemenation of lucene54 > > codec when writing the posting lists using the lucene50 postinglistwriter > > while going through the code. What exactly does the finish() method in > the > > TermsWriter class of the BlockTreeTermsWriter.java file do? I have come > to > > undertstand that the posting lists(document ID, frequency, etc) is mainly > > is mainly written using WriteBlock method in the ForUtil.java file... > > > > Thanks.. > > > > > > > > > > On Mon, Mar 28, 2016 at 4:21 PM, Greg Bowyer <gbow...@fastmail.co.uk> > > wrote: > > > >> The posting list is compressed using a specialised technique aimed at > >> pure numbers. Currently the codec uses a variant of Patched Frame of > >> Reference coding to perform this compression. > >> > >> A good survey of such techniques can be found in the good IR books > >> (https://mitpress.mit.edu/books/information-retrieval, > >> > >> > http://www.amazon.com/Managing-Gigabytes-Compressing-Multimedia-Information/dp/1558605703 > >> , > >> http://nlp.stanford.edu/IR-book/) as well as this paper > >> http://eprints.gla.ac.uk/93572/1/93572.pdf. > >> > >> Interestingly, there are potentially some wins in finding better integer > >> codings (and one of my personal projects is aimed at doing exactly > >> this), but I doubt LZ4 compressing the posting list would help all that > >> much. > >> > >> Hope this helps > >> > >> On Mon, Mar 28, 2016, at 10:51 AM, Vishwas Jain wrote: > >> > Hello , > >> > > >> > We are trying to implement better compression techniques in > >> > lucene54 codec of Apache Lucene. Currently there is no such > compression > >> > for > >> > posting lists in lucene54 codec but LZ4 compression technique is used > >> for > >> > stored fields. Does anyone know why there is no compression technique > >> for > >> > postings lists? and what are the possible compression that would > benefit > >> > if > >> > implemented? > >> > > >> > Thanks > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > >