On Mon, Jun 08, 2026 at 09:48:34AM -0400, Michael S. Tsirkin wrote: > On Mon, Jun 08, 2026 at 10:43:21AM +0100, Lorenzo Stoakes wrote: > > On Mon, Jun 08, 2026 at 04:34:23AM -0400, Michael S. Tsirkin wrote: > > > TestSetPageHWPoison() is called without zone->lock, so its atomic > > > update to page->flags can race with non-atomic flag operations > > > that run under zone->lock in the buddy allocator. > > > > > > In particular, __free_pages_prepare() does: > > > > > > page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > > > > > > This non-atomic read-modify-write, while correctly excluding > > > __PG_HWPOISON from the mask, can still lose a concurrent > > > TestSetPageHWPoison if the read happens before the poison bit > > > is set and the write happens after. Follow-up patches in this > > > series add similar non-atomic flag operations as well. > > > > > > Fix by acquiring zone->lock around TestSetPageHWPoison and > > > around ClearPageHWPoison in the retry path. This > > > serializes with all buddy flag manipulation. The cost is > > > negligible: one lock/unlock in an extremely rare path > > > (hardware memory errors). > > > > > > Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere > > > in this file operate on pages already removed from the buddy > > > allocator or on non-buddy pages (DAX, hugetlb), so they do not > > > need zone->lock protection. > > > > > > Acked-by: Miaohe Lin <[email protected]> > > > Signed-off-by: Michael S. Tsirkin <[email protected]> > > > > Can we have Fixes: and Cc: stable and also send this separately please? > > > > These patches seem like unrelated fixups that you've discovered along the > > way, > > and don't belong as part of the already rather large series, unless I'm > > missing > > something here. > > > > Thanks, Lorenzo > > I think you are mising that they are a dependency, not unrelated.
Then say so. > For example, this issue gets worse with the patchset as there are more > places that manipulate flags without atomics. No? It's your job to make that case, not mine. > > > You are welcome to send this to stable, but I think stable rules > preclude theoretical bugfixes. It's a dependency but also theoretical? > > As for Fixes: the issue has been there for decades. I wouldn't know > what to attribute it for. Again, your job. > > > I guess I could send these separately, too, why not. Not sure > what this accomplishes, but hey. But is that an ack? You want > this fix merged even before the feature? I already made the case as to why, as have other maintainers. If you need to review what an ack looks like please consult https://docs.kernel.org/process/5.Posting.html Thanks, Lorenzo

