Hi Ludo,

Sorry for the delay.  I'm currently very busy.  Small summary follows:

>That’s on the bare metal, right?  It does look like the file system was
>indeed in a bad state and that we’re just confirming it?

Yes, but it would have stayed in the bad state forever because our
e2fsck was too old.

I think what happened (preliminary... but should be close enough):

There is code in ext4 that handles "orphan" inodes.
Those "orphan" inodes are inodes that no directory entry points to
anymore.

For example if you open a file "A" and then unlink it and keep it open,
the corresponding inode becomes an orphan, but cannot be GCed yet (since
you are using it).

There is code in ext4 that eventually, when everyone closed the orphan,
gets rid of it entirely, freeing the payload extents.

However, for the (later) case if the computer crashes, ext4 also
remembers the set of orphans somewhere ON DISK.  That's so it can
find the orphans later (after booting again) and free them.

In a recent ext4 update in the kernel, the orphan handling grew a new
option, I think on by default, that stores this set of orphans in a
regular file (instead of in some weird metadata in the superblock as it
did before).

Now if you remount a filesystem readonly, the kernel cannot (well,
should not) actually update the contents of that orphan file (since it's
a regular file and you said "read-only" :P), so it can happen that the
set of orphans is incorrect.  In such a case, the kernel sets the
filesystem as dirty (earlier, it would just fail the remount ro--but we
didn't see that anymore).

Now (old) e2fsck will come and see some weird floating inodes but it
doesn't know what they are so it leaves them alone.  It clears the
damage flag.

We eventually make some more orphans, we mount fs ro, there we go,
endless filesystem corruption loop.

Anyway, after the updated e2fsck the machine has no problems anymore
(so far...).

Docs:
<https://www.kernel.org/doc/Documentation/filesystems/ext4/orphan.rst>

Kernel bugfix:
<https://lore.kernel.org/all/20240611142704.14307-1-luis.henriq...@linux.dev/>

Reply via email to