On Tue, Jan 06, 2026 at 09:19:59PM +0100, Mikulas Patocka wrote: > > > On Tue, 6 Jan 2026, Liam R. Howlett wrote: > > > * Mikulas Patocka <[email protected]> [260105 15:08]: > > > > > > > If you only get the error message sometimes, does that mean there is > > > > another signal check that isn't covered by this change - or another call > > > > path? > > > > > > This call path is also triggered by -EINTR from mm_take_all_locks: > > > "init_user_pages -> amdgpu_hmm_register -> mmu_interval_notifier_insert > > > -> > > > mmu_notifier_register -> __mmu_notifier_register -> mm_take_all_locks -> > > > return -EINTR". I am not expert in the GPU code, so I don't know how much > > > serious it is. > > > > Okay, so the other call paths also end up getting the -EINTR from this > > function? Can you please add that detail to the commit message? > > Yes. I'd like to ask the GPU people to look at it and say how much damage > this -EINTR could do. I don't know - I just saw the messages "Failed to > register MMU notifier: -4" in the syslog. > > > This means that -EINTR can no longer be returned from open(), right? > > Otherwise you are just reducing a race condition between open() and a > > signal entering from your timer. > > EINTR can be returned from open() in cases when it was historically > behaving this way - such as opening a fifo when there is no matching > process having it open. > > But I think that opening /dev/kfd doesn't fall into this category. >
Well, it's a device - opening can and often does have side-effects. It's not too far-fetched to -EINTR here. > NFS has an "intr" flag that makes the filesystem syscalls interruptible by > signals. It is off by default, because many programs don't expect EINTR > when opening, reading or writing plain files on a filesystem. > > > Any other -EINTR system call will also cause you problems since you > > continuously send signals to your process, so we'll have to change them > > all for this to work? > > I use SA_RESTART for the signals. And I retry all the syscalls on EINTR > just in case SA_RESTART didn't work. So, I don't experience random > failures in my code due to the periodic signal. > > But there is code that I have no control over - such as the OpenCL shared > library. Right. So I am wondering if just returning -ERESTARTSYS (whether in mm_take_all_locks(), or in the AMD driver) would satisfy both parties. Folks installing and using signals need to pay attention and set SA_RESTART, but that's already best practice when dealing with third-party code. open(2) should be transparently restartable. WDYT? -- Pedro
