On Wed, Feb 9, 2011 at 1:18 PM, Bob Proulx <b...@proulx.com> wrote: > > Since the exit status of /bin/true is ignored then I think that test > case is flawed. I think at the least needs to check the exit status > of the /bin/true process. > > bash -c 'while true; do /bin/true || exit 1; done'
The "|| exit 1" doesn't make any sense. If you seriously claim that that is needed for ^C to work reliably, you're just totally mistaken. Your whole premise that you should look at the error return code is total and utter crap. Lookie here: while : ; do sleep 1; done which is *exactly* the same case, and dammit, if ^C doesn't break out of that loop, then the shell is a broken POS. Agreed? If you tell me that it needs a "|| exit 1", you're just broken. Try it. And now go back to the original case. The same "^C should break out" is true when you replace "sleep' with "/bin/true" or with anything else. It had better break out every single time, on the first try. And it really doesn't. And it's a bash bug. I don't understand why bash people can't accept that. People even debugged it to the particular line of source code in bash. I just tried it: [torvalds@i5 ~]$ while : ; do /bin/sleep 1; done ^C [torvalds@i5 ~]$ while : ; do /bin/true; done ^C^C^C [torvalds@i5 ~]$ while : ; do /bin/true; done ^C [torvalds@i5 ~]$ while : ; do /bin/true; done ^C [torvalds@i5 ~]$ while : ; do /bin/true; done ^C^C^C and the thing to notice is that it clearly is very much about some race condition. Sometimes it works on the first try, sometimes it doesn't. Why are you arguing? Why are you bringing up totally idiotic arguments, while others are ignoring it because they can't reproduce it. There were people who reproduced this on OS X too, btw, so it clearly is not a Linux issue, even if you put your blinders on and ignore the fact that it was already root-caused by Oleg. The problem is that 'set_job_status_and_cleanup()' does that if (wait_sigint_received && (WTERMSIG (child->status) == SIGINT) && .. which just looks totally buggy and racy. There's even a comment about it in the bash source code, for chrissake! Here's the scenario: - wait_for() sets wait_sigint_received to zero (look for the comment here!), and installs the sigint handler - it does other things too, but it does waitchld() that does the actual waitpid() system call - now, imagine the following scenario: the ^C happens just as the child already exited successfully! - so bash itself gets the sigint, and sets wait_sigint_received to 1 So what happens? child->status will be successful (the child was not interrupted by the signal, it exited at just the right time), but bash saw the SIGINT. But because it thinks it needs to see *both* the sigint _and_ the WTERMSIG(child->status)==SIGINT, bash essentially ignores the ^C. Note how bash magically would have worked correctly if the child process had taken one extra millisecond, and also seen the ^C and died of it. Notice how bash acts differently based on that millisecond difference? So it's a bug. Please don't make inane and incorrect excuses for it ("you didn't have an '|| exit 1' there), and please don't say "I can't reproduce it". Even without reproducing it, just looking at the source code should be good enough, no? Especially as Oleg already pinpointed the exact line for you. Now, it does look like the problem is at least partly because bash has a horrible time trying to figure out a truly ambigious case: did the child process explicitly ignore the ^C or not? It looks like bash is trying to basically ignore the ^C in the case the child ignored it. I think that's misguided, but that does seem to be what bash is trying to do. It's misguided exactly because there is absolutely no way to know whether the child returned successfully because it just happened to exit just before the ^C came in, or whether it blocked ^C and ignored it. So even _trying_ to make that judgement call seems to be a bad idea. And no, I don't know bash sources all that well. I played around with them a long time ago, and for this I only glanced at it quickly to get more of a view into what bash is trying to do (all thanks should go to Oleg who already pinpointed the line that breaks). Maybe there are subtle issues, maybe there are broken historical shell semantics here. But please don't ignore this bug just because you cannot reproduce it. Linus