Hello,

I am migrating a rather large application from Lucene 4.10 to Lucene 5.5.0.
Since Filters are deprecated in Lucene 5, I am looking for an efficient 
replacement in our code.

We use many Filters that calculate the DocIdSet by doing a lookup of numeric 
DocValues in some collection.
Everything is based on "long" types and results could be large.
Pseudo code in Filter class looks like this:

    @Override
    public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs) 
throws IOException {
        AtomicReader reader = context.reader();
        OpenBitSet docSet = new OpenBitSet();
        NumericDocValues docValues = reader.getNumericDocValues(filterKeyName);

        for (int doc = 0; doc < reader.maxDoc(); doc++) {
            long value = docValues.get(doc); // getting DocValues for current 
doc
            if (isMatch(value)) { // check value against some condition
                docSet.set(doc); // set bit for doc
            }
        }
        return docSet;
    }


I wonder what the proper and efficient replacement for such filtering is?

Should I convert my matching value set into a TermsQuery and wrap with 
ConstantScoreQuery?
I could do this, but then I am concerned about:

*         Efficiency:
The matching document in the isMatch() method above could be very large. I 
would need to create large collection of Terms rather than the memory efficient 
DocIdSet.


*         More efficiency:
>From my current understanding, I would need to create a Term from the String 
>representation of my long value. Isn't this inefficient again?

I would really appreciate any recommendations on this.

Thanks a lot and best regards,
Josef

Reply via email to