Re: desire for journaled filesystem

Janne Johansson Thu, 07 Sep 2023 23:03:59 -0700

Den fre 8 sep. 2023 kl 03:47 skrev Steve Litt <sl...@troubleshooters.com>:
>
> My main computer is Void Linux. If I had to restore from backup every
> time the disks became mildly messed up, all my time would be spent
> backing up and restoring.
>
> I remember back in the 90's and early 00's before journalling every
> system crash was grounds for an ulcer.

Then again, ext2-3-4 run in asynch mode for all operations, which is
why e2fsck takes such a long time, the act of creating a new file
needs at least four operations (allocating space for contents, adding
filename entry to directory, creating inode for metadata and writing
out the actual contents).

If you run async file systems, these can happen in any random order,
and if you have a crash while files are being created (and deleted)
any of these may or may not have happened. BSD ffs does these mostly
in order (where softdep can change/delay some of them) which means
that fsck for ffs can know that if step 3 isn't done, step 4 will not
have started either.

For e2fsck, all possible combinations must be explored. Adding to
this, ext filesystems don't seem to have any kind of way to express "I
found an unchecked error so I am in need of a detailed fsck", which is
why dists using ext2 would have "magic" files like touching /autofsck
and removing said file in order to indicate if last shutdown was good
or bad.

Even with this simplistic method, they would STILL force fsck every
100 days or 58 reboots, because well, you can't tell if there ever was
an error during the last 100 days, since there is no method to mark
the known-broken fs as needing fsck.

In the light of this, the need for a journal (even at the cost of
slightly more IO at times) becomes obvious. The fine folks over at the
penguin camp will rather write to a journal "I am about to create
/tmp/tmp.FSGSGRg3", then send those four operations, then clear the
journal entry again, just so the middle 4 ops can be async, than
"suffer" some ordering in the file system operations.

Now, bsd can run softdep which speeds some writes up, at some cost and
some added risk, and you can certainly mount async and have really
large risks added, but for each of those two steps, I would make very
sure that I had either useless data, or (as suggested) good backups in
place.

As Nick wrote, bsd people tend to like the fact that when your IO
subsystem says "the data is on the disk", it actually is there. Ext4
had a nice period* when "on the disk" meant "it will be on disk in 2
and a half minutes" even for atomic operations. You can imagine how
many people managed to have issues or lose power in the span of 150
seconds. I think they shortened the time, but the amount of tears
needed for the "go fast even if you go in the wrong direction" crowd
to change their minds was quite large.

To me, it is like usb writing speeds. OpenBSD will have dog slow
speed. But it will also allow you to unmount the device when the write
is finished. Other common OSes will tell you "done!" in a few seconds,
then the stick is still blinking, and you ask to unmount and then it
still takes this long amount of time because it was just lying to you
about the writes being finished. If I am to wait 30 seconds to write a
large ISO to my stick, I'd rather have the computer show me it is
working, instead of hoping I would write the file in "three" seconds
and then read comics for 27 seconds before unmounting so I don't
notice the discrepancy.

*)
https://www.pointsoftware.ch/2014/02/05/linux-filesystems-part-4-ext4-vs-ext3-and-why-delayed-allocation-is-bad/

--
May the most significant bit of your life be positive.

Re: desire for journaled filesystem

Reply via email to