[PR] [BugFix] Fix NOT TEXT_MATCH false positives on consuming segments [pinot]

via GitHub Fri, 13 Mar 2026 13:15:04 -0700


heng-kuang-777 opened a new pull request, #17880:
URL: https://github.com/apache/pinot/pull/17880


   On consuming segments, Lucene operates in near-realtime mode and recently 
ingested documents may not yet be visible to the IndexSearcher until the next 
SearcherManager refresh. When evaluating NOT TEXT_MATCH, the filter inversion 
was operating over [0, numDocs) — the full segment doc count — causing 
unindexed tail documents to appear as false positives.
   
   Fix by introducing `getSearchableDocCount()` on `TextIndexReader`, which 
returns the number of documents currently visible to the Lucene searcher on 
realtime indexes (updated on each refresh), or -1 for offline/sealed segments 
where all docs are indexed. `TextMatchFilterOperator` now uses this count as 
the inversion universe instead of numDocs, so unindexed tail docs are excluded 
from NOT results.
   
   Fixes #17809 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [BugFix] Fix NOT TEXT_MATCH false positives on consuming segments [pinot]

Reply via email to