Re: Strange issues with epoll since 5.0

2019-05-02 Thread Davidlohr Bueso
On Thu, 02 May 2019, Deepa Dinamani wrote: Reported-by: Omar Kilani Do we actually know if this was the issue Omar was hitting? Thanks, Davidlohr

Re: Strange issues with epoll since 5.0

2019-05-02 Thread Eric Wong
Deepa Dinamani wrote: > Eric, > Can you please help test this? Nope, that was _really_ badly whitespace-damaged. (C'mon, it's not like you're new to this)

Re: Strange issues with epoll since 5.0

2019-05-02 Thread Deepa Dinamani
Eric, Can you please help test this? If this solves your problem, I can post the fix. Thanks, - Deepa -8<--- Subject: [PATCH] signal: Adjust error codes according to restore_user_sigmask() For all the syscalls that receive a sigmask from the userland, the user sigmask is to be in eff

Re: Strange issues with epoll since 5.0

2019-05-01 Thread Deepa Dinamani
On Wed, May 1, 2019 at 1:48 PM Eric Wong wrote: > > Deepa Dinamani wrote: > > So here is my analysis: > > > > > So the 854a6ed56839a40f6 seems to be better than the original code in > > that it detects the signal. > > OTOH, does matter to anybody that a signal is detected slightly > sooner than

Re: Strange issues with epoll since 5.0

2019-05-01 Thread Eric Wong
Deepa Dinamani wrote: > So here is my analysis: > So the 854a6ed56839a40f6 seems to be better than the original code in > that it detects the signal. OTOH, does matter to anybody that a signal is detected slightly sooner than it would've been, otherwise? > But, the problem is that it doesn't

Re: Strange issues with epoll since 5.0

2019-05-01 Thread Deepa Dinamani
Thanks for trying the fix. So here is my analysis: Let's start with epoll_pwait: ep_poll() is what checks for signal_pending() and is responsible for setting errno to -EINTR when there is a signal. So if a signal is received after ep_poll(), it is never noticed by the syscall during execution.

Re: Strange issues with epoll since 5.0

2019-05-01 Thread Eric Wong
Eric Wong wrote: > (didn't test AIO, but everything else seems good) "seems" != "is" Now that I understand the fix for epoll, the fs/select.c changes would hit the same problem and not return -EINTR when it should. I'll let you guys decide how to fix this, but there's definitely a problem when

Re: Strange issues with epoll since 5.0

2019-04-30 Thread Eric Wong
Eric Wong wrote: > Deepa Dinamani wrote: > > I'm not sure what the hang in the userspace is about. Is it because > > the syscall did not return an error or the particular signal was > > blocked etc. > > Uh, ok; that's less comforting. Nevermind, I think I understand everything, now. epoll_pwai

Re: Strange issues with epoll since 5.0

2019-04-30 Thread Eric Wong
Deepa Dinamani wrote: > I was also not able to reproduce this. > Arnd and I were talking about this today morning. Here is something > Arnd noticed: > > If there was a signal after do_epoll_wait(), we never were not > entering the if (err = -EINTR) at all before. I'm not sure which `if' statemen

Re: Strange issues with epoll since 5.0

2019-04-30 Thread Deepa Dinamani
I was also not able to reproduce this. Arnd and I were talking about this today morning. Here is something Arnd noticed: If there was a signal after do_epoll_wait(), we never were not entering the if (err = -EINTR) at all before. But, now we do. We could try with the below patch: diff --git a/fs/

Re: Strange issues with epoll since 5.0

2019-04-29 Thread Eric Wong
Davidlohr Bueso wrote: > On Sun, 28 Apr 2019, Eric Wong wrote: > > > Just running one test won't trigger since it needs a busy > > machine; but: > > > > make test/mgmt_auto_adjust.log > > (and "rm make test/mgmt_auto_adjust.log" if you want to rerun) > > fyi no luck reproducing on both

Re: Strange issues with epoll since 5.0

2019-04-29 Thread Davidlohr Bueso
On Sun, 28 Apr 2019, Eric Wong wrote: Just running one test won't trigger since it needs a busy machine; but: make test/mgmt_auto_adjust.log (and "rm make test/mgmt_auto_adjust.log" if you want to rerun) fyi no luck reproducing on both either a large (280) or small (4 cpu) mac

Re: Strange issues with epoll since 5.0

2019-04-27 Thread Eric Wong
Deepa Dinamani wrote: > I tried to replicate the failure on qemu. > I do not see the failure with N=32. > Does it work for N < 32? Depends on number of cores you have; I have 4 cores, 8 threads with HT; so I needed to have a lot of load on the machine to get it to fail (it takes about 1 minute).

Re: Strange issues with epoll since 5.0

2019-04-27 Thread Deepa Dinamani
I tried to replicate the failure on qemu. I do not see the failure with N=32. Does it work for N < 32? Does any other signal work? Are there any other architectures that fail? Could you help me figure out how to run just the one test that is failing? -Deepa

Re: Strange issues with epoll since 5.0

2019-04-27 Thread Eric Wong
Eric Wong wrote: > Omar Kilani wrote: > > Hi there, > > > > I’m still trying to piece together a reproducible test that triggers > > this, but I wanted to post in case someone goes “hmmm... change X > > might have done this”. > > Maybe Davidlohr knows, since he's responsible for most of the > e

Re: Strange issues with epoll since 5.0

2019-04-24 Thread Davidlohr Bueso
On Wed, 24 Apr 2019, Davidlohr Bueso wrote: On Wed, 24 Apr 2019, Eric Wong wrote: Omar Kilani wrote: Hi there, I???m still trying to piece together a reproducible test that triggers this, but I wanted to post in case someone goes ???hmmm... change X might have done this???. Maybe Davidloh

Re: Strange issues with epoll since 5.0

2019-04-24 Thread Davidlohr Bueso
On Wed, 24 Apr 2019, Eric Wong wrote: Omar Kilani wrote: Hi there, I???m still trying to piece together a reproducible test that triggers this, but I wanted to post in case someone goes ???hmmm... change X might have done this???. Maybe Davidlohr knows, since he's responsible for most of th

Re: Strange issues with epoll since 5.0

2019-04-24 Thread Eric Wong
Omar Kilani wrote: > Hi there, > > I’m still trying to piece together a reproducible test that triggers > this, but I wanted to post in case someone goes “hmmm... change X > might have done this”. Maybe Davidlohr knows, since he's responsible for most of the epoll changes in 5.0. > Basically, s

Strange issues with epoll since 5.0

2019-04-15 Thread Omar Kilani
Hi there, I’m still trying to piece together a reproducible test that triggers this, but I wanted to post in case someone goes “hmmm... change X might have done this”. Basically, something’s broken (or at least, has changed enough to cause problems in user space) in epoll since 5.0. It’s still br