Hi all, our problem is to choose the best (the fastest) way to iterate over huge set of documents (basic and most important case is to iterate over all documents in the index). Some slow process accesses documents and now it is done via repeating query (for instance MatchAllDocsQuery). It processess first N docs then repeats query and processes next N docs and so on. Repeating query means in fact quadratic time! So we think about changing the way docs are accessed. In case of generic query the only way to speed it up we see is to keep HitCollector in memory between requests for docs. Isn't this approach too memory consuming? In case of iterating over all documents I was wondering if there is a way to determine set of index ids over which we could iterate (and of course control index changes - if index is changed between requests we should probably invalidate 'iterating session'). What is the best solution for this problem? Thanks and regards,
wojtek