Hi,

There are two related but orthogonal parts to this:

1. The refactoring to IOContext and hints, that Simon has described.
2. The default advice that Lucene should use out-of-the-box.

I believe that we are in good shape to completing no.1. For no.2, we discussed 
this in the following issue https://github.com/apache/lucene/issues/14408 - the 
conclusion is that we revert the default back to NORMAL.

With this, then Lucene does not set MADV_RANDOM, unless the user opts-in - 
which is greatly improved by no.1.

-Chris.

> On 8 Aug 2025, at 09:40, Simon Cooper <simon.coo...@elastic.co.INVALID> wrote:
> 
> As I've been working in this area, here's my 2c...
> 
> The move from ReadAdvice to IOContext hints is as yet unfinished, 
> https://github.com/apache/lucene/pull/14977 and 
> https://github.com/apache/lucene/pull/14844 will finish it off. Once those 
> are merged, ReadAdvice will only be used as an implementation detail of 
> MMapDirectory and related classes, core Lucene classes will only deal with 
> IOContext and hints. By subclassing MMapDirectory, you can modify the hints 
> that are passed down to the base implementation as you need to, and/or 
> specify your own hints or IOContext implementations to help refine the 
> behaviour you need.
> 
> It will then be up to each directory implementation to look at the hints 
> specified, and use those to inform how it should open the files. At the 
> moment, MMapDirectory is the only one which does this, and it does this using 
> different ReadAdvices based on the hints. Exactly which ReadAdvice is used 
> for a particular combination of hints can be modified. I'm also not sure 
> where NORMAL or RANDOM is best used, but I've tried to keep current behaviour 
> unchanged as much as possible so far.
> 
> SimonC
> 
> On Thu, 7 Aug 2025 at 22:03, Michael Sokolov <soko...@falutin.net.invalid> 
> wrote:
> I want to raise an issue here that has come up before which is about the 
> choices we have made to apply madvise flags in an opinionated way. 
> 
> In our environment, the choices Lucene is making are really detrimental to 
> our indexing throughput. In the past we had disabled this by subclassing 
> MMapDirectory (a super expert workaround). Somehow we missed the fact that 
> changes in Lucene 10 made this workaround ineffective and it took us a while 
> to find the new recommended workaround, which is a system property setting.  
> In an excess (perhaps) of caution, instead of the sysprop we've opted to 
> modify a Lucene fork to disable this in a more fundamental way (cauterizing 
> PosixNativeAccess.madvise), I think hoping that this might insulate us 
> against future changes in this area? But we don't want to have to engage in 
> this kind of paranoid programming!
> 
> Lucene has made a choice that may be good for some environments or operating 
> conditions, but not for others, and the difference can be pretty dramatic. 
> I'm not sure how we came to decide that the current default is better than 
> the old one?  I'll also say I don't really understand why the MADV_RANDOM is 
> hurting us so much, but it does cause our merge operations to get much 
> slower, fall behind, and pile up to the extent that low-resource environments 
> (that used to work fine with MADV_NORMAL) are crumbling under the weight of 
> pending merges. 
> 
> Another thread is that the multiple layers of abstraction we have today 
> (IOContext + ReadAdvice + DataAccessHint + FileDataHint + madvise) make it 
> quite difficult to reason about what OS behavior is happening for any given 
> IO operation. I read the IOContext javadocs but they only give general 
> information and don't explain how hints are used to determine an actual MADV 
> flag.  In what circumstance should I use a hint vs an advice? The 
> IndexInput.updateReadAdvice javadoc actually says "provide a hint" but 
> accepts an advice.
> 
> So to summarize: 
> 
>     • Selflishly, I don't like the current default MADV setting Lucene has 
> chosen, although I recognize it's possible it may work for some use case.  
> But I do wonder at some level if the OS's default shouldn't be a good default 
> setting?
>     • I find the Lucene API in this area confusing and not well-documented.  
> Understanding that the IO contexts are many and varied and could profitably 
> be tuned differently, I wonder if we could have a centralized and first-class 
> API (not a system property) that can be used to set a memory access profile 
> of some sort?
> 
> I think some evidence supporting the choices we have made today (why is the 
> default MADV_RANDOM) would be helpful as a starting point. Maybe there is a 
> past thread I overlooked?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to