Re: 4.0 and 4.1 FieldCacheImpl.DocTermsImpl.exists(docid) possibly broken

Michael McCandless Thu, 18 Jul 2013 06:55:56 -0700

Thanks Doron, that's definitely completely backwards!!

Good thing the API is gone.



Mike McCandless

http://blog.mikemccandless.com


On Thu, Jul 18, 2013 at 7:50 AM, Doron Cohen <[email protected]> wrote:
> Hi, just an FYI - may be helpful for anyone obliged to use 4.0.0 or 4.1.0 -
> it seems that this method is actually doing the opposite of its intention.
>
> I did not find mentions of this in the lists or elsewhere.
>
> This is the code for o.a.l.search.FieldCacheImpl.DocTermsImpl.exists(int):
> public boolean exists(int docID) {
>       return docToOffset.get(docID) == 0;
> }
>
> Its description says: "Returns true if this doc has this field and is not
> deleted".
> But it returns true for docs not containing the field and false for those
> that do contain it.
>
> A simple workaround is to not to call this method before calling getTerm()
> but rather just rely on getTerm()  logic: "... returns the same BytesRef, or
> an empty (length=0) BytesRef if the doc did not have this field or was
> deleted."
>
> So usage code can be like this:
> DocTerms values =  FieldCache.DEFAULT.getTerms(reader, FIELD_NAME);
> BytesRef term = new BytesRef();
> for (int docid=0; docid<values.size(); docid++) {
> term = values.getTerm(docid, term);
> if (term.length>0) {
> doSomethingWith(term.utf8ToString());
> }
> }
> FieldCache.DEFAULT.purge(reader);
>
> I am not sure about the overhead of this comparing to first checking
> exists(), but it at least work correctly.
>
> The code for exists() was as above until R1442497
> (http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java?revision=1442497&view=markup)
> and then in R1443717
> (http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java?r1=1442497&r2=1443717&diff_format=h)
> the API was change as part of LUCENE-4547 (DocValues improvements) which was
> included in 4.2.
>
> Simple code to demonstrate this (here with 4.1 but same results with 4.0):
>
> RAMDirectory d = new RAMDirectory();
> IndexWriter w = new IndexWriter(d, new IndexWriterConfig(Version.LUCENE_41,
> new SimpleAnalyzer(Version.LUCENE_41)));
> w.addDocument(new Document()); // Empty doc (0, 0)
> Document doc = new Document(); // Real doc (1, 1)
> doc.add(new StringField("f1", "v1", Store.NO));
> w.addDocument(doc);
> w.addDocument(new Document()); // Empty doc (2, 2)
> w.addDocument(new Document()); // Empty doc (3, 3)
> w.commit(); // Commit - so we'll have two atomic readers
> doc = new Document(); // RealDoc (0, 4)
> doc.add(new StringField("f1", "v2", Store.NO));
> w.addDocument(doc);
> w.addDocument(new Document()); // Empty doc (1, 5)
> w.close();
>
> IndexReader r = DirectoryReader.open(d);
> BytesRef br = new BytesRef();
> for (AtomicReaderContext leaf : r.leaves()) {
> System.out.println("--- new atomic reader");
> AtomicReader reader = leaf.reader();
> DocTerms a = FieldCache.DEFAULT.getTerms(reader, "f1");
> for (int i = 0; i < reader.maxDoc(); ++i) {
> int n = leaf.docBase + i;
> System.out.println(n+" exists: "+a.exists(i));
> br = a.getTerm(i, br);
> if (br.length > 0) {
> System.out.println(n + "  "+br.utf8ToString());
> }
> }
> }
>
> The result printing:
>
>       --- new atomic reader
>       0 exists: true
>       1 exists: false
>       1  v1
>       2 exists: true
>       3 exists: true
>       --- new atomic reader
>      4 exists: false
>      4  v2
>      5 exists: true
>
> Indeed, exists() results are wrong.
>
> So again, just an FYI, as this API no longer exists...
>
> Regards,
> Doron

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: 4.0 and 4.1 FieldCacheImpl.DocTermsImpl.exists(docid) possibly broken

Reply via email to