Hi to all, I used pruning package with LA Times collection. The initial LA Times index is created by lucene benchmark/conf/*.alg. Luke shows 131896 documents with 635614 terms for initial index. I pruned with CarmelTopKPruning policy with epsilon = 0.1 by varying k. However, my results do not correspond to the original paper's results (Static Index Pruning for Information Retrieval Systems by Carmel et al.). Lucene score function can be the reason but the difference is big so I wonder if the package is tested with LA Times and the similar results are obtained???
What can be the reason of such difference? I count the number of postings by for each term counter += te.docFreq(); Do you know any paper who uses this package for experiments? k, Prune(%) Original Paper, Prune (%) Pruning Package, # postings in pruned index , # posting no pruned 1 49,2 91 3663309 37860694 5 40,2 90 4139019 10 36,4 89 4485072 15 34,2 88 4743474 50 x 69 11990022 Thanks in advance, Best Regards ZP -- View this message in context: http://lucene.472066.n3.nabble.com/test-LA-Times-with-pruning-package-tp4007730.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org