On Tue, Oct 06, 2020 at 09:35:33AM -0400, Theodore Y. Ts'o wrote: > On Mon, Oct 05, 2020 at 10:03:06PM -0700, Josh Triplett wrote: > > I'm not trying to create a problem here; I'm trying to address a whole > > family of problems. I was generally under the impression that mounting > > existing root filesystems fell under the scope of the kernel<->userspace > > or kernel<->existing-system boundary, as defined by what the kernel > > accepts and existing userspace has used successfully, and that upgrading > > the kernel should work with existing userspace and systems. If there's > > some other rule that applies for filesystems, I'm not aware of that. > > (I'm also not trying to suggest that every random corner case of what > > the kernel *could* accept needs to be the format definition, but rather, > > cases that correspond to existing userspace.) > > I'm not opposed to the kernel side change; it's *this time*.
I appreciate that. (Does that mean you wouldn't object to this patch going into 5.9, with Jens' Reviewed-by and your ack?) > I'm more interested in killing off the tool that generated the > malformed file system in the first place. It was developed for a reason, and it's not intended for general-purpose use. It serves its purpose, which has nothing to do with e2fsprogs or any component of e2fsprogs. It does not simply create filesystem images, in the way that mke2fs or e2fsdroid does. If I'd found any existing code that would have worked, or could have been adapted to work, I would have happily reused it; I don't like reinventing the wheel. Also, see below regarding e2fsck. > As I keep pointing out, things aren't going to go well if "e2fsck -E > unshare_blocks" is applied to it. These aren't intended to *ever* become writable. I've ensured that they're marked with EXT4_FEATURE_RO_COMPAT_READONLY as well, which I would hope implies that. Would that be sufficient to convince e2fsck that it should never attempt to apply unshare_blocks to it? Also, it seems like most of block_validity is trying to carefully avoid metadata blocks intersecting with the journal, which could cause all manner of issues; the filesystems I'm working with have no journal (because they're permanently read-only). With the sole exception of the shared bitmap block, I have it as a requirement to pass e2fsck with zero complaints. If you'd be willing to take a patch to e2fsck along the same lines as this kernel patch, then I'd be happy to add it to the regression test suite: no filesystem images that e2fsck is unhappy with. Would that help? As mentioned before, I'd also be happy to supply some representative filesystem images. > We had a similar issue with Android. Many years ago, Andy Rubin was > originally quite allergic to the GPL, and had tried to promulgate the > rule, "no GPL in Android Userspace". This is why bionic is used as > libc, and this resulted in Android engineers (I think before the > Google acquisition, but I'm not 100% sure), creating an unofficial, > "unauthorized" make_ext4fs which was a BSD-licensed version of mke2fs. That is indeed a sad reason to develop anything. It's also particularly sad to develop a different tool to serve exactly the same purpose as an existing tool. As opposed to, in this case, developing a different piece of software to serve different purposes, that happens to make use of ext4 as one component. > Unfortuantely, it created file systems which the kernel would never > complain about, but which, 50% of the time, would result in a file > system which under some circumstances, would get corrupted (even more) > when e2fsck attempted to repair the file system. So if a user had a > bit flip caused by an eMMC hiccup, e2fsck could end up making things > worse. I'd be interested in reading more about that, if you could suggest a link or the right search terms. I did find https://bugzilla.kernel.org/show_bug.cgi?id=195561 , which references the issues in the same way you've done here, but I didn't find any of the specific details you're mentioning here about what made the images it created fragile. I did see you mention, there, the advice of making sure to check filesystems with e2fsck. That's something I've done from day one: I didn't stop until e2fsck was happy, not just the kernel. > So that's why I really don't like it when there are "unauthorized", > unofficial tools creating file systems out there which we are now > obliged to support. ext4 is an incredibly powerful and performant filesystem, with a reasonably well-documented format and many useful properties. Not all software that works with ext4 is going to live in e2fsprogs, nor should it need to. This would be analogous to the notion that (say) the FreeBSD kernel belongs in the e2fsprogs repository because it has an ext4 driver. The tail would be wagging the dog. > Even if it's OK as far as the kernel is > concerned, unless you're planning on forking and/or reimplementing all > of e2fsprogs, and doing so correctly, that way is going to cause > headaches for file system developers. I haven't recreated or reimplemented e2fsprogs, and I have no desire to do so. It seems like, as a result of the make_ext4fs debacle, you're seeing any other software working with ext4 as an instance of reinventing the wheel (and failing to make that wheel round, because hey, it still rolls). That's not the case here. > > I don't *want* to rely on what apparently turned out to be an > > undocumented bug in the kernel's validator. That's why I was trying to > > fix the issue in what seemed like the right way, by detecting the > > situation and turning off the validator. That seemed like it would fully > > address the issue. If it would help, I could also supply a tiny filesystem > > image for regression testing. > > I'm OK with working around the problem, and we're lucky that it's this > simple.... this time around. > > But can we *please* take your custom tool out back and shoot it in the > head? Nope. As mentioned, this isn't about creating ext4 filesystem images, and it isn't even remotely similar to mke2fs. > And perhaps we need to make a policy that makes it clear that for file > systems, it's not just about "whatever the kernel happens to accept". > It should also be, "was it generated using an official version of the > userspace tools", at least as a consideration. I don't think that's a reasonable policy at all, no. "Should pass e2fsck" would be a relatively reasonable policy, assuming that e2fsck doesn't ratchet up strictness in a way that breaks existing filesystems. I think it'd be reasonable to recommend against trying to push the boundaries of what the kernel accepts but e2fsck doesn't, without a concrete plan to improve e2fsck and the filesystem documentation. - Josh Triplett