Hi,
Le 2024-12-24 15:10, Simon Richter a écrit :
This should not make any difference in the number of write operations
necessary, and only affect ordering. The data, metadata journal and
metadata update still have to be written.
I would expect that some reordering makes it possible for fewer actual
physical write operations to happen, i.e. writes to same/neighbouring
blocks get merged/grouped (eventually by the hardware if not the kernel)
which would make a difference on both spinning devices performance (less
seeks) and solid state devices longevity (as these have larger physical
blocks), but I don't know if that's actually how it works in that case.
One way to know would be to bench what actually happens nowadays with
and without --force-unsafe-io to get some actual numbers to weigh in the
decision to make the change or not.
It would be surprising though that the dpkg man pages (among other
places) talks about performance degradations if these were not real.
The only way this could be improved is with a filesystem level
transaction, where we can ask the file system to perform the entire
update atomically -- then all the metadata updates can be queued in
RAM, held back until the data has been synchronized by the kernel in
the background, and then added to the journal in one go. I would expect
that with such a file system, fsync() becomes cheap, because it would
just be added to the transaction, and if the kernel gets around to
writing the data before the entire transaction is synchronized at the
end, it becomes a no-op.
That sounds interesting. But — do we have filesystems on Linux that can
do that already, or is this still a wishlist item? Also worth noting, at
least one well-known implementation in another OS was deprecated [1]
citing complexity and lack of popularity as the reasons for that
decision, and the feature is missing in their next-gen FS. So maybe it's
not that great after all?
Anyway in the current toolbox besides --force-unsafe-io we also have:
- volume or FS snapshots, for similar or better safety but not the
automatic performance gains; probably not (yet?) available on most
systems
- the auto_da_alloc ext4 mount option that AIUI should do The Right
Thing in dpkg's use case even without the fsync, actual reliability and
performance impact unknown; appears to be set by default on trixie
- eatmydata
- io_uring that allows asynchronous file operations; implementation
would require important changes in dpkg; potential performance gains in
dpkg's use case are not yet evaluated AFAIK but it looks like the right
solution for that use case.
BTW for those interested in reading a bit more about the historical and
current context around this issue aka O_PONIES I'm adding a few links at
[2].
but e.g. puppet might become confused
Heh. Ansible wins again.
So no, we cannot drop the fsync(). :\
Nowadays, most machines are unlikely to be subject to power failures at
the worst time: laptops or other mobile devices that have batteries have
replaced desktop PCs in many workplaces and homes, and machines in
datacenters usually have redundant power supplies and
batteries+generators backups. And the default filesystem for new
installations, ext4, is mounted with auto_da_alloc by default which
should make this drop safe, but whether that will result in significant
performance gains is IMO something to be tested.
If the measured performance gain makes it interesting to drop the fsync,
maybe this could become a configuration item that is set automatically
in most cases by detecting the machine type (battery, dual PSU,
container, VM => drop fsync) and filesystem (safe fs and mount options
=> drop fsync) or by asking the user in other cases or in expert install
mode, defaulting to the safer --no-force-unsafe-io.
Cheers,
[1]:
https://learn.microsoft.com/en-us/windows/win32/fileio/deprecation-of-txf
[2]: https://lwn.net/Articles/351422/
https://lwn.net/Articles/322823/
https://lwn.net/Articles/1001770/
--
Julien Plissonneau Duquène