mikemccand opened a new issue, #13193:
URL: https://github.com/apache/lucene/issues/13193

   ### Description
   
   (Spinoff from tricky discussions on 
https://github.com/apache/lucene/pull/13190).
   
   Lucene's `ConcurrentMergeScheduler` allows users to set a `mbPerSec` rate 
limit on bytes written for each merge.  It's a complex feature to implement, 
and the [new intra-merge concurrency coming 
shortly](https://github.com/apache/lucene/pull/13190) makes it even  trickier 
(see above PR).  It is also best effort since separate threads writing bytes 
during merging "check in" only periodically after enough bytes have been 
written on their private thread.
   
   However, there is one known bug that I would call a bug (and not a "best 
effort" limitation): it uses a naive instant measure of the IO rate, meaning it 
simply looks at the time of the last "check-in" from a writing thread, versus 
the current time and the current number of bytes written, and decides whether 
to pause.  This is somewhat brutal as IO writes can be bursty: merging things 
like postings might mean quite a bit of CPU effort in between writes, 
sometimes, and other times, not (e.g. merging a large postings list).  The 
instant approach we take today gives no credit for a longish period of time 
when no/few bytes were written.
   
   It's sort of like telling a runner in a race that they get not credit from 
running slowly for a while, to get their wind back, and are not allowed to then 
sprint.
   
   This means that if you were to sum up total bytes written by total time 
taken, the net `mbPerSec` is likely far below the specified limit.
   
   A better model would be something like how AWS (and likely other cloud 
providers) handles IOPs limits on an EC2 instance using a "burst bucket" 
("Burst IOPS is a feature of Amazon Web Services (AWS) EBS volume types that 
allows applications to store unused IOPS in a burst bucket and then drain them 
when needed" -- thank you Gemini for the summary).  It would have some state, 
allowing a burst after a period of not much IO, and would more closely throttle 
to the overall target rate, while allowing some bursting to catch up for a lull.
   
   We could maybe run some benchmarks to see in practice whether Lucene's IO 
during merging is smooth enough that this lack of burstiness isn't really 
hurting things much ...
   
   Alternatively, we could remove this complex and tricky-to-implement and 
best-effort-with-known-bugs feature of Lucene?  IO devices have only become 
more performant (many use cases store the Lucene index on SSDs now), machines 
have more RAM for the OS to cache writes and gradually spool them to disk, etc. 
 This feature is also likely to be mis-used, and make users think Lucene cannot 
keep up with merging ...
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to