FYI - I opened the following PR to change the default read advice to back to 
NORMAL.

https://github.com/apache/lucene/pull/15040

We can continue the discussion there.

-Chris.

> On 8 Aug 2025, at 10:03, Chris Hegarty <christopher.hega...@elastic.co> wrote:
> 
> Hi,
> 
> There are two related but orthogonal parts to this:
> 
> 1. The refactoring to IOContext and hints, that Simon has described.
> 2. The default advice that Lucene should use out-of-the-box.
> 
> I believe that we are in good shape to completing no.1. For no.2, we 
> discussed this in the following issue 
> https://github.com/apache/lucene/issues/14408 - the conclusion is that we 
> revert the default back to NORMAL.
> 
> With this, then Lucene does not set MADV_RANDOM, unless the user opts-in - 
> which is greatly improved by no.1.
> 
> -Chris.
> 
>> On 8 Aug 2025, at 09:40, Simon Cooper <simon.coo...@elastic.co.INVALID> 
>> wrote:
>> 
>> As I've been working in this area, here's my 2c...
>> 
>> The move from ReadAdvice to IOContext hints is as yet unfinished, 
>> https://github.com/apache/lucene/pull/14977 and 
>> https://github.com/apache/lucene/pull/14844 will finish it off. Once those 
>> are merged, ReadAdvice will only be used as an implementation detail of 
>> MMapDirectory and related classes, core Lucene classes will only deal with 
>> IOContext and hints. By subclassing MMapDirectory, you can modify the hints 
>> that are passed down to the base implementation as you need to, and/or 
>> specify your own hints or IOContext implementations to help refine the 
>> behaviour you need.
>> 
>> It will then be up to each directory implementation to look at the hints 
>> specified, and use those to inform how it should open the files. At the 
>> moment, MMapDirectory is the only one which does this, and it does this 
>> using different ReadAdvices based on the hints. Exactly which ReadAdvice is 
>> used for a particular combination of hints can be modified. I'm also not 
>> sure where NORMAL or RANDOM is best used, but I've tried to keep current 
>> behaviour unchanged as much as possible so far.
>> 
>> SimonC
>> 
>> On Thu, 7 Aug 2025 at 22:03, Michael Sokolov <soko...@falutin.net.invalid> 
>> wrote:
>> I want to raise an issue here that has come up before which is about the 
>> choices we have made to apply madvise flags in an opinionated way. 
>> 
>> In our environment, the choices Lucene is making are really detrimental to 
>> our indexing throughput. In the past we had disabled this by subclassing 
>> MMapDirectory (a super expert workaround). Somehow we missed the fact that 
>> changes in Lucene 10 made this workaround ineffective and it took us a while 
>> to find the new recommended workaround, which is a system property setting.  
>> In an excess (perhaps) of caution, instead of the sysprop we've opted to 
>> modify a Lucene fork to disable this in a more fundamental way (cauterizing 
>> PosixNativeAccess.madvise), I think hoping that this might insulate us 
>> against future changes in this area? But we don't want to have to engage in 
>> this kind of paranoid programming!
>> 
>> Lucene has made a choice that may be good for some environments or operating 
>> conditions, but not for others, and the difference can be pretty dramatic. 
>> I'm not sure how we came to decide that the current default is better than 
>> the old one?  I'll also say I don't really understand why the MADV_RANDOM is 
>> hurting us so much, but it does cause our merge operations to get much 
>> slower, fall behind, and pile up to the extent that low-resource 
>> environments (that used to work fine with MADV_NORMAL) are crumbling under 
>> the weight of pending merges. 
>> 
>> Another thread is that the multiple layers of abstraction we have today 
>> (IOContext + ReadAdvice + DataAccessHint + FileDataHint + madvise) make it 
>> quite difficult to reason about what OS behavior is happening for any given 
>> IO operation. I read the IOContext javadocs but they only give general 
>> information and don't explain how hints are used to determine an actual MADV 
>> flag.  In what circumstance should I use a hint vs an advice? The 
>> IndexInput.updateReadAdvice javadoc actually says "provide a hint" but 
>> accepts an advice.
>> 
>> So to summarize: 
>> 
>>    • Selflishly, I don't like the current default MADV setting Lucene has 
>> chosen, although I recognize it's possible it may work for some use case.  
>> But I do wonder at some level if the OS's default shouldn't be a good 
>> default setting?
>>    • I find the Lucene API in this area confusing and not well-documented.  
>> Understanding that the IO contexts are many and varied and could profitably 
>> be tuned differently, I wonder if we could have a centralized and 
>> first-class API (not a system property) that can be used to set a memory 
>> access profile of some sort?
>> 
>> I think some evidence supporting the choices we have made today (why is the 
>> default MADV_RANDOM) would be helpful as a starting point. Maybe there is a 
>> past thread I overlooked?
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to