Re: Query Expansion Module for Lucene based on BM25 ranking function

2008-10-23 Thread Joaquin Perez Iglesias
Hi Grant and Jose, just to give some more details, as Jose said avg_length is precalculated at indexing time using an specific Similarity class. Basically this can be done through the lengthNorm method, for each document and field the total length is stored, when the indexing process is finish

Re: Query Expansion Module for Lucene based on BM25 ranking function

2008-10-22 Thread José Ramón Perez Aguera
Hi Grant, Our query expansion approach is quite simple, we apply pseudo- relevance feedback techniques, where a number of top retrieved documents are used to extract the terms candidates to expand the original query. We have used TermPositions in query time to extract the term statistics n

Re: Query Expansion Module for Lucene based on BM25 ranking function

2008-10-22 Thread Grant Ingersoll
Hi José, Can you explain your approach to implementing? I'm curious how you incorporated in the avg. doc length. Also, have you followed any of the flexible indexing discussions? Finally, what's the license on this code? Thanks, Grant On Oct 21, 2008, at 10:14 AM, José Ramón Pérez Agüer

Query Expansion Module for Lucene based on BM25 ranking function

2008-10-21 Thread José Ramón Pérez Agüera
Hello, We have implemented a research module for lucene using BM25 and our structured version of BM25 as ranking functions and a couple of state-of-art query expansion algoritms. This implementation is quite different to other query expansion modules for Lucene that are available in the web. We