FYI - I opened the following PR to change the default read advice to back to NORMAL.
https://github.com/apache/lucene/pull/15040 We can continue the discussion there. -Chris. > On 8 Aug 2025, at 10:03, Chris Hegarty <christopher.hega...@elastic.co> wrote: > > Hi, > > There are two related but orthogonal parts to this: > > 1. The refactoring to IOContext and hints, that Simon has described. > 2. The default advice that Lucene should use out-of-the-box. > > I believe that we are in good shape to completing no.1. For no.2, we > discussed this in the following issue > https://github.com/apache/lucene/issues/14408 - the conclusion is that we > revert the default back to NORMAL. > > With this, then Lucene does not set MADV_RANDOM, unless the user opts-in - > which is greatly improved by no.1. > > -Chris. > >> On 8 Aug 2025, at 09:40, Simon Cooper <simon.coo...@elastic.co.INVALID> >> wrote: >> >> As I've been working in this area, here's my 2c... >> >> The move from ReadAdvice to IOContext hints is as yet unfinished, >> https://github.com/apache/lucene/pull/14977 and >> https://github.com/apache/lucene/pull/14844 will finish it off. Once those >> are merged, ReadAdvice will only be used as an implementation detail of >> MMapDirectory and related classes, core Lucene classes will only deal with >> IOContext and hints. By subclassing MMapDirectory, you can modify the hints >> that are passed down to the base implementation as you need to, and/or >> specify your own hints or IOContext implementations to help refine the >> behaviour you need. >> >> It will then be up to each directory implementation to look at the hints >> specified, and use those to inform how it should open the files. At the >> moment, MMapDirectory is the only one which does this, and it does this >> using different ReadAdvices based on the hints. Exactly which ReadAdvice is >> used for a particular combination of hints can be modified. I'm also not >> sure where NORMAL or RANDOM is best used, but I've tried to keep current >> behaviour unchanged as much as possible so far. >> >> SimonC >> >> On Thu, 7 Aug 2025 at 22:03, Michael Sokolov <soko...@falutin.net.invalid> >> wrote: >> I want to raise an issue here that has come up before which is about the >> choices we have made to apply madvise flags in an opinionated way. >> >> In our environment, the choices Lucene is making are really detrimental to >> our indexing throughput. In the past we had disabled this by subclassing >> MMapDirectory (a super expert workaround). Somehow we missed the fact that >> changes in Lucene 10 made this workaround ineffective and it took us a while >> to find the new recommended workaround, which is a system property setting. >> In an excess (perhaps) of caution, instead of the sysprop we've opted to >> modify a Lucene fork to disable this in a more fundamental way (cauterizing >> PosixNativeAccess.madvise), I think hoping that this might insulate us >> against future changes in this area? But we don't want to have to engage in >> this kind of paranoid programming! >> >> Lucene has made a choice that may be good for some environments or operating >> conditions, but not for others, and the difference can be pretty dramatic. >> I'm not sure how we came to decide that the current default is better than >> the old one? I'll also say I don't really understand why the MADV_RANDOM is >> hurting us so much, but it does cause our merge operations to get much >> slower, fall behind, and pile up to the extent that low-resource >> environments (that used to work fine with MADV_NORMAL) are crumbling under >> the weight of pending merges. >> >> Another thread is that the multiple layers of abstraction we have today >> (IOContext + ReadAdvice + DataAccessHint + FileDataHint + madvise) make it >> quite difficult to reason about what OS behavior is happening for any given >> IO operation. I read the IOContext javadocs but they only give general >> information and don't explain how hints are used to determine an actual MADV >> flag. In what circumstance should I use a hint vs an advice? The >> IndexInput.updateReadAdvice javadoc actually says "provide a hint" but >> accepts an advice. >> >> So to summarize: >> >> • Selflishly, I don't like the current default MADV setting Lucene has >> chosen, although I recognize it's possible it may work for some use case. >> But I do wonder at some level if the OS's default shouldn't be a good >> default setting? >> • I find the Lucene API in this area confusing and not well-documented. >> Understanding that the IO contexts are many and varied and could profitably >> be tuned differently, I wonder if we could have a centralized and >> first-class API (not a system property) that can be used to set a memory >> access profile of some sort? >> >> I think some evidence supporting the choices we have made today (why is the >> default MADV_RANDOM) would be helpful as a starting point. Maybe there is a >> past thread I overlooked? > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org