On Fri, Jan 27, 2012 at 4:53 PM, Hany Azzam <h...@eecs.qmul.ac.uk> wrote: > Hi Robert, > > Thanks for the reply. I am trying to do something different. If I use a > mutireader then the searching/scoring will take place over the two indexes at > the same time. However, in my case the subcomponents of the retrieval model > are calculated over separate evidence spaces. For example, the retrieval > model calculates something like that: > > score := P(query_term | documents) * P(query_term | relevant_documents) > > The P(query_term | documents) can be estimated using the index over the whole > collection of documents. The P(query_term | relevant_documents) can be > estimated using the index over the relevant documents only (which are known > prior to the execution of the query). >
In this situation, if you want to combine the statistics from different indexes in your own way, you can look at IndexSearcher.termStatistics() and IndexSearcher.collectionStatistics(). These are intended for situations like distributed search, but maybe you can make use of them. here is some pseudocode: IndexReader relevant = IndexReader.open(relevantDirectory); IndexReader documents = IndexReader.open(documentsDirectory); final IndexSearcher relevantSearcher = new IndexSearcher(relevant); IndexSearcher documentsSearcher = new IndexSearcher(documents) { @Override public CollectionStatistics collectionStatistics(String field) throws IOException { CollectionStatistics documentStats = super.collectionStatistics(field); return new CollectionStatistics(... someCombinationOf(documentStats + stuff from relevantSearcher)); } // do a similar thing for termStatistics().... }; documentsSearcher.search(...) -- lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org