> Any chance you could upgrade to either the Debian kernel or one based > on the Debian kernel or on 4.19 mainline and see if any of them fix it? > > Also, I suggest you try to get those quirks fixed in Linux mainline, > so that you don't have to keep building Linux yourself :)
I'm afraid upgrading would be difficult. The use-case is one of fairly entrenched networking for my company, so there's significant regression-testing required to even move further within the same minor version. The quirks are also somewhat difficult to generalise: some of the hardware is rather antiquated, requiring driver hacks for initialisation, and many of the other changes are similarly niche and intended only as case-specific optimisations. (It's divergent in much the same way that much of the OpenWRT project's kernel patches will never be suitable for mainline, but we do submit bugfixes and stuff when we can) I was able to confirm that the problem doesn't occur with Debian's 4.19 series on unrelated hardware, however, and it looks like the problem has been resolved for a while in stock kernels. > Do you have any details about which patch this is? https://lists.ubuntu.com/archives/kernel-team/2018-May/092723.html I wasn't able to find an equivalent in Debian's patchsets, which led me to check upstream; more on that below. > Also, it would be great if you could try to get the patch into the > Linux kernel's mainline stable releases. After digging around for a while, it looks like this may be a side-effect of how LTSI works, though I'll report it anyway, in hopes that it can be addressed without violating the "don't break the userspace ABI" policy. Any semi-recent kernel version should be unaffected. The following patch adds a "NoNewPrivs" line to /proc/<pid>/status, where the blank appears in 4.4. It doesn't look like it was ever backported into the tree, or, rather, it seems to be the case that retpoline logic, which assumed there was text right after the capabilities block, was backported directly, leading to the gap. NoNewPrivs patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=af884cd4a5ae62fcf5e321fecf0ec1014730353d In any case, this is definitely an issue that should be fixed in (specific versions of) the kernel. I still think iotop should be synced to gain tolerance to unexpected input, but it isn't a Debian-specific problem in light of what's been discussed here. Thanks for the sanity-checks here. -Neil