Hi Robert,
the adapted codec is running but it seems to be incredible slow. Will take some time ;) Here are some performance results: Indexing scheme Index Size Avg. Query performance Max. Query Performance PforDelta2 W Freq W Pos 20.6 GB (3,3 GB w/o .pos) 81.97 ms 1295 ms PforDelta2 W/O Freq W/O Pos 1.6 GB 63.33 ms 766 ms Standard 4.0 W Freq W Pos 28.1 GB (8,1 GB w/o .prx) 77.71 ms 978 ms Standard 4.0 W/O Freq W/O Pos 6.2 GB 59.93 ms 718 ms Standard 3.0 W Freq W Pos 28.1 GB (8,1 GB w/o .prx) 71.41 ms 978 ms Standard 3.0 WO Freq WO Pos 6.2 GB 72.72 ms 845 ms PforDelta W Freq W Pos 22 GB (5 GB w/o .pos) 67.98 ms 783 ms PforDelta W/O Freq W/O Pos 3.1 GB 56.08 ms 596 ms Huffman BL10 W Freq W/O Pos 2.6 GB 216.29 ms (Mem 14 ms) 1338 ms I am a little bit curious about the Lucene 3.0 performance results because the larger index seems to work faster?!? I already ran the test several times. Are my results realistic at all? I thought PForDelta/2 would outperform the standard index implementations in query processing. The last result is my own implementation. I am still looking to get it smaller because I think I can improve compression further. For indexing I use PForDelta2 in combination with payloads. Those are causing the higher runtimes. In memory it looks nice. The gap between my solution and PForDelta is already 700 MB. I would say it is an improvement. :D I will have a look at it again after I got an index with your adapted implementation. I still have another question. The basic idea in my implementation is to create a "Two-Level" index structure. It is specialized for versioned document collections. On the first level I create a posting list entry for a document whenever a term occurs in one or more of its versions. The second level holds corresponding term frequency informations. Is it possible to build such a structure by creating a codec? For query processing it should filter per boolean query on the first level and only fetch information from the second level when the document is in the intersection of the first level. At the moment I use payloads to "simulate" a two-level structure. Normally all payloads corresponding to a query get fetched, right? If this structure would be possible there are several more implementations with promising results (Two-Level Diff/MSA in this paper http://cis.poly.edu/suel/papers/version.pdf). Regards Alex -- View this message in context: http://lucene.472066.n3.nabble.com/New-codecs-keep-Freq-skip-omit-Pos-tp2849776p2855554.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.