jpountz opened a new issue, #13179:
URL: https://github.com/apache/lucene/issues/13179
### Description
Currently, Lucene's I/O concurrency is bound by the search concurrency. If
`IndexSearcher` runs on N threads, then Lucene will never perform more than N
I/Os concurrently. Unless you significantly overprovision your search thread
pool - which is bad for other reasons, Lucene will bottleneck on I/O latency
without even maxing out the IOPS of the host.
I don't think that Lucene should fully embrace asynchronousness in its APIs,
or query evaluation would become overly complicated. But I still expect that we
have a lot of room for improvement to allow each search thread to perform
multiple I/Os concurrently under the hood when needed.
Some examples:
- When running a query on two terms, e.g. `apache OR lucene`, could the I/O
lookups in the `tim` file (terms dictionary) be performed concurrently for both
terms?
- When running a query on two terms and start offsets in the `doc` file
(postings) have been resolved, could we start loading the first bytes from
these postings lists from disk concurrently?
- When fetching the top N=100 stored documents that match a query, could we
load bytes from the `fdt` file (stored fields) for all these documents
concurrently?
This would require API changes in our `Directory` APIs, and some low-level
`IndexReader` APIs (`TermsEnum`, `StoredFieldsReader`?).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]