[I] Improve Lucene's I/O concurrency [lucene]

via GitHub Wed, 13 Mar 2024 04:13:41 -0700


jpountz opened a new issue, #13179:
URL: https://github.com/apache/lucene/issues/13179


   ### Description
   
   Currently, Lucene's I/O concurrency is bound by the search concurrency. If 
`IndexSearcher` runs on N threads, then Lucene will never perform more than N 
I/Os concurrently. Unless you significantly overprovision your search thread 
pool - which is bad for other reasons, Lucene will bottleneck on I/O latency 
without even maxing out the IOPS of the host.
   
   I don't think that Lucene should fully embrace asynchronousness in its APIs, 
or query evaluation would become overly complicated. But I still expect that we 
have a lot of room for improvement to allow each search thread to perform 
multiple I/Os concurrently under the hood when needed.
   
   Some examples:
    - When running a query on two terms, e.g. `apache OR lucene`, could the I/O 
lookups in the `tim` file (terms dictionary) be performed concurrently for both 
terms?
    - When running a query on two terms and start offsets in the `doc` file 
(postings) have been resolved, could we start loading the first bytes from 
these postings lists from disk concurrently?
    - When fetching the top N=100 stored documents that match a query, could we 
load bytes from the `fdt` file (stored fields) for all these documents 
concurrently?
   
   This would require API changes in our `Directory` APIs, and some low-level 
`IndexReader` APIs (`TermsEnum`, `StoredFieldsReader`?).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Improve Lucene's I/O concurrency [lucene]

Reply via email to