atris opened a new pull request, #14525:
URL: https://github.com/apache/lucene/pull/14525
Add AnytimeRankingSearcher for SLA-aware early termination with bin-based
score boosting
This patch adds AnytimeRankingSearcher, a new low-latency search
implementation that supports early termination under SLA constraints, combined
with bin-aware score boosting.
## Architecture
Index-time binning uses a configurable post-indexing pass to assign each
document to one of bin.count bins. This pass is activated via field attributes
(doBinning=true, bin.count=N, etc.) and is triggered after all standard
postings are written. Binning uses a segment-local sparse similarity graph
where each node is a document and edges represent cosine similarity between
term frequency vectors.
The bin distribution is computed via recursive graph bisection. The graph is
recursively split into halves using a seeded heuristic that assigns each
document to the closer of two seed nodes based on edge weights. This ensures
intra-bin similarity and minimizes cross-bin connectivity. A fixed number of
bins is produced, and the assignment is saved to a .binmap file.
In approximate mode (graph.builder=approx), we avoid building explicit term
vectors. Instead, token co-occurrence is tracked using per-term BitSets, and
documents are grouped using lightweight overlap heuristics. This trades off
precision for speed and scales better on large segments.
At search time, BinMapReader loads the bin assignments, and BinScoreReader
makes them accessible to search collectors. BinBoostCalculator assigns a boost
score to each bin based on estimated bin quality (e.g. average term frequency
or rank share in a warmup run). This boost is applied additively during
ranking, allowing the collector to prioritize high-quality bins earlier and
exit faster under SLA pressure.
## Binning Modes (Index Time)
This patch supports two modes of document binning during indexing:
• Absolute mode: computes exact bin assignments using full
document similarity graphs.
• Approximate mode: enabled when document count exceeds a
threshold; skips graph construction and uses faster heuristics to assign bins.
Bin assignment is handled by DocBinningGraphBuilder and switches to
ApproximateDocGraphBuilder automatically when needed.
To enable binning, field attributes must be set:
```
fieldType.putAttribute("postingsFormat", "Lucene101");
fieldType.putAttribute("doBinning", "true");
fieldType.putAttribute("bin.count", "4"); // total number of bins
fieldType.putAttribute("graph.builder", "exact" | "approx" | "auto"); //
binning strategy
```
## Search-Time Integration
At search time, bin boosts are loaded using BinScoreReader. To enable
anytime ranking:
```
AnytimeRankingSearcher searcher = new AnytimeRankingSearcher(reader, topK,
slaMs, fieldName);
TopDocs results = searcher.search(query);
```
Internally:
• Bin scores are applied per segment at query time.
• The collector monitors elapsed time and stops scoring once SLA
is exhausted.
## Test Coverage
Includes a full test (TestAnytimeRankingSearchQuality) that:
• Indexes 10k docs with periodic relevant content
• Runs baseline and anytime search
• Computes NDCG, precision, recall
• Asserts average and max position delta across result sets
• Verifies minimal degradation under SLA constraints
## Performance
• AnytimeRankingSearcher provides ~2–3x speedup at low SLA targets
• Recall, precision, and NDCG remain within 95%+ of baseline
• Position delta of relevant docs remains bounded
## Notes
• Readers are wrapped using BinScoreUtil.wrap(reader) to enable
bin-aware scoring
• Compound readers are tracked and closed explicitly
• BinFilter skipping is not implemented yet — will be added in a
follow-up patch
• Fallback to approximate binning ensures indexing remains
scalable for large segments
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]