Chris Hostetter wrote:

But it don'treally see how it chagnes anything ... if your goal is to perform an operating on every result in a set, why are you using TopDocSCollector? why not write a HitCollector and perform your operation in the collect method?

We're not using TopDocCollector right now, as we're still using Hits. Performing some operation over every result is just one use case. We also have to deal with the user scrolling the display. Currently this works acceptably using the same java.util.List model for both cases. Sometimes a bulk operation needs to iterate over the items more than once, which makes it trickier to invert (I guess we'd have to perform the search twice. And then if the results change between the two executions other problems happen. Some of this we can get around by caching the bitsets for filters which are the only thing which change.)

I don't disagree with you, but is there a reason to have the entire set in memory all at once regardless of it's size?

No... that's why maybe I thought keeping the rest on disk would be a plan, whether it's done as a bitset (and re-execute a trivial search with the cached bitset) or a list of integers.

At the end of the day: people switching to using TopDocsCollector instead of Hits are no worse off when trying to iterate over every result in a ginormous result set, they're just have to define "ginormous" for themselves, and the get an OOM right away instead of once they iterate up to that many.

This is effectively our problem, if it fails fast the user just says "well why couldn't I even see the first 10 results?" It's much better to fail with an OOM later when retrieving result someBigValue + 1.

A custom HitCollector will definitely get around that, really I'm just saying it's a shame to lose a part of Lucene which works for the case for which it was designed, just because someone deemed that nobody needed it for that case. The first comment on the bug report for deprecating it even says something like "it was originally designed for GUI but was anyone even using it for that?" Some of us obviously were.

Daniel


--
Daniel Noll                            Forensic and eDiscovery Software
Senior Developer                              The world's most advanced
Nuix                                                email data analysis
http://nuix.com/                                and eDiscovery software

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to