Re: DocIdSet to represent small numberr of hits in large Document set

Jason Rutherglen Tue, 05 Apr 2011 07:54:11 -0700

I think Solr has a HashDocSet implementation?

On Tue, Apr 5, 2011 at 3:19 AM, Michael McCandless
<luc...@mikemccandless.com> wrote:
> Can we simply factor out (poach!) those useful-sounding classes from
> Nutch into Lucene?
>
> Mike
>
> http://blog.mikemccandless.com
>
> On Tue, Apr 5, 2011 at 2:24 AM, Antony Bowesman <a...@thorntothehorn.org> 
> wrote:
>> I'm converting a Lucene 2.3.2 to 2.4.1 (with a view to going to 2.9.4).
>>
>> Many of our indexes are 5M+ Documents, however, only a small subset of these
>> are relevant to any user.  As a DocIdSet, backed by a BitSet or OpenBitSet,
>> is rather inefficient in terms of memory use, what is the recommended way to
>> DocIdSet implementation to use in this scenario?
>>
>> Seems like SortedVIntList can be used to store the info, but it has no
>> methods to build the list in the first place, requiring an array or bitset
>> in the constructor.
>>
>> I had used Nutch's DocSet and HashDocSet implementations in my 2.3.2
>> deployment, but want to move away from that Nutch dependency, so wondered if
>> Lucene had a way to do this?
>>
>> Thanks
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: DocIdSet to represent small numberr of hits in large Document set

Reply via email to