On Sun, Nov 5, 2017 at 12:46 PM, Linus Torvalds <torva...@linux-foundation.org> wrote: > On Sat, Nov 4, 2017 at 5:39 PM, Rob Landley <r...@landley.net> wrote: >> On 11/03/2017 08:07 PM, Linus Torvalds wrote: >>>> >>>> But it boils down to "got the limit wrong, the exec failed after the >>>> fork(), dynamic recovery from which is awkward so I'm trying to figure >>>> out the right limit". >> >> Sounds later like dynamic recovery is what you recommend. (Awkward >> doesn't mean I can't do it.) > > So my preferred solution would certainly be for xargs to realize that > _SC_ARG_MAX is at most an "informed guess" rather than some absolute > value. > > Or, alternatively, simply use a _SC_ARG_MAX that is small enough that > we can say "yes, that's what we've always supported". That, in > practice, is the 128kB value. > > It's probably also big enough that nobody cares any more. Sure, you > *could* feed bigger arguments, and we do end up basically supporting > it because people who don't write proper scripts will simply do > > grep some-value $(find . -name '*.c') > > and that works pretty well if your code-base isn't humongous. > > It still works for the kernel, for example, but there's no question > that it's still not something we *guarantee* works. > > It doesn't work if you don't limit it to *.c files: > > [torvalds@i7 linux]$ grep some-value $(find .) > -bash: /usr/bin/grep: Argument list too long > > so when the kernel expanded the argument list from the traditional > 128kB, it was for _convenience_, it was not meant for people who do > proper scripting and use xargs. > > See the difference? > >>> What _is_ the stack limit when using toybox? Is it just entirely unlimited? >> >> Answer to second question on ubuntu 14.04: >> >> landley@driftwood:~/linux/linux/fs$ ulimit -s 999999999 >> landley@driftwood:~/linux/linux/fs$ ulimit -s >> 999999999 > > Oh, I can do that on Fedora too. > > But the thing is, nobody actually seems to do that. > > And our regression rule has never been "behavior doesn't change". > That would mean that we could never make any changes at all. > > For example, we do things like add new error handling etc all the > time, which we then sometimes even add tests for in our kselftest > directory. > > So clearly behavior changes all the time and we don't consider that a > regression per se. > > The rule for a regression for the kernel is that some real user > workflow breaks. Not some test. Not a "look, I used to be able to do > X, now I can't". > > So that's why this whole "is this a test or a real workload" question > is important to me, and what odd RLIMIT_STACK people actually use.
Android's default RLIMIT_STACK is 8192. the specific bug report was from an interactive shell user. i'm not aware of any part of the platform itself that relies on this, nor any third-party app. > I'm not interested in "can you reproduce it", because that's simply > not the issue for us from a regression standpoint. > > That said, I tried this under Fedora: > > ulimit -s unlimited > find / -print0 2> /dev/null | xargs -0 /usr/bin/echo | wc > > and it reported 991 lines. That would match just using 128kB as the > breaking point for xargs. > > So apparently at least Fedora doesn't seem to use that "stack limit / > 4" thing, but the traditional 128kB value. > > I don't know if that is because of 'xargs', or because of library > differences - I didn't check. > > And I have a hard time judging whether the regression report is a > "look, this behavior changed", or a "we actually have a real > regression visible to actual users". > > See above why that matters to me. > >> It sounds like "probe and recover" is my best option. > > I actually suspect "just use 128kB" is the actual best option in practice. for libc's sysconf(_SC_ARG_MAX) too? i'm fine changing bionic back to reporting 128KiB if there's an lkml "Linus says" mail that i can link to in the comment. it certainly seems like an overly-conservative choice is better than the current situation... > Sure, "probe and recover" is the best option in theory, but it's a lot > more complex, and there doesn't actually seem to be a lot of upside. > > So to take my silly example of piping "find /" to xargs: I can't > imagine that anybody sane should care really whether it results in > 900+ execve's (for that 128kB limit) or just 20 (for some "optimal" > 6MB limit). > > And there is actually a somewhat real fear with the > "probe-and-recover" model: the already mentioned nasty case "sometimes > we don't return E2BIG, we might just succeed the execve, but then kill > the process very early due to some _other_ limit being triggered". > > That nasty case really shouldn't happen, but it was the case we > considered (and rejected) for suid binaries. > > So that's the real reason the kernel behavior changed: we had a > security issue with the "we allow basically unlimited stack growth" > where a huge argv/envp could be used to grow the stack into some other > segment. > > Realistically, it was only a security issue with suid binaries, but as > mentioned in this thread, the problem is that we do the argument > copying even before we've decided whether the execve is going to be a > suid execve. > > So then - exactly so that we do *not* have that nasty case of "execve > succeeds, but we kill the process immediately if it turns into a > potential security issue", we do that "limit the stack growth for > everybody". > > Linus -- Elliott Hughes - http://who/enh - http://jessies.org/~enh/ Android native code/tools questions? Mail me/drop by/add me as a reviewer.