DocIdSet to represent small numberr of hits in large Document set

Antony Bowesman Mon, 04 Apr 2011 23:25:26 -0700

I'm converting a Lucene 2.3.2 to 2.4.1 (with a view to going to 2.9.4).

Many of our indexes are 5M+ Documents, however, only a small subset of these arerelevant to any user. As a DocIdSet, backed by a BitSet or OpenBitSet, israther inefficient in terms of memory use, what is the recommended way toDocIdSet implementation to use in this scenario?

Seems like SortedVIntList can be used to store the info, but it has no methodsto build the list in the first place, requiring an array or bitset in theconstructor.

I had used Nutch's DocSet and HashDocSet implementations in my 2.3.2 deployment,but want to move away from that Nutch dependency, so wondered if Lucene had away to do this?


Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

DocIdSet to represent small numberr of hits in large Document set

Reply via email to