Hi to all,

I used pruning package with LA Times collection. The initial LA Times index
is created by lucene benchmark/conf/*.alg. Luke shows 131896 documents with
635614 terms for initial index. I pruned with CarmelTopKPruning policy with
epsilon = 0.1 by varying k.  However, my results do not correspond to the
original paper's results (Static Index Pruning for Information Retrieval
Systems by Carmel et al.). Lucene score function can be the reason but the
difference is big so I wonder if the package is tested with LA Times and the
similar results are obtained???

What can be the reason of such difference? I count the number of postings by
for each term counter += te.docFreq();

Do you know any paper who uses this package for experiments?


k, Prune(%) Original Paper,     Prune (%) Pruning Package,       # postings in
pruned index ,  # posting no pruned

1       49,2    91      3663309 37860694
5       40,2    90      4139019 
10      36,4    89      4485072 
15      34,2    88      4743474 
50      x       69      11990022        


Thanks in advance,
Best Regards
ZP




--
View this message in context: 
http://lucene.472066.n3.nabble.com/test-LA-Times-with-pruning-package-tp4007730.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to