Hi,

Le 2024-12-24 15:10, Simon Richter a écrit :

This should not make any difference in the number of write operations necessary, and only affect ordering. The data, metadata journal and metadata update still have to be written.

I would expect that some reordering makes it possible for fewer actual physical write operations to happen, i.e. writes to same/neighbouring blocks get merged/grouped (eventually by the hardware if not the kernel) which would make a difference on both spinning devices performance (less seeks) and solid state devices longevity (as these have larger physical blocks), but I don't know if that's actually how it works in that case.

One way to know would be to bench what actually happens nowadays with and without --force-unsafe-io to get some actual numbers to weigh in the decision to make the change or not.

It would be surprising though that the dpkg man pages (among other places) talks about performance degradations if these were not real.

The only way this could be improved is with a filesystem level transaction, where we can ask the file system to perform the entire update atomically -- then all the metadata updates can be queued in RAM, held back until the data has been synchronized by the kernel in the background, and then added to the journal in one go. I would expect that with such a file system, fsync() becomes cheap, because it would just be added to the transaction, and if the kernel gets around to writing the data before the entire transaction is synchronized at the end, it becomes a no-op.

That sounds interesting. But — do we have filesystems on Linux that can do that already, or is this still a wishlist item? Also worth noting, at least one well-known implementation in another OS was deprecated [1] citing complexity and lack of popularity as the reasons for that decision, and the feature is missing in their next-gen FS. So maybe it's not that great after all?

Anyway in the current toolbox besides --force-unsafe-io we also have:
- volume or FS snapshots, for similar or better safety but not the automatic performance gains; probably not (yet?) available on most systems - the auto_da_alloc ext4 mount option that AIUI should do The Right Thing in dpkg's use case even without the fsync, actual reliability and performance impact unknown; appears to be set by default on trixie
- eatmydata
- io_uring that allows asynchronous file operations; implementation would require important changes in dpkg; potential performance gains in dpkg's use case are not yet evaluated AFAIK but it looks like the right solution for that use case.

BTW for those interested in reading a bit more about the historical and current context around this issue aka O_PONIES I'm adding a few links at [2].

but e.g. puppet might become confused

Heh. Ansible wins again.

So no, we cannot drop the fsync(). :\

Nowadays, most machines are unlikely to be subject to power failures at the worst time: laptops or other mobile devices that have batteries have replaced desktop PCs in many workplaces and homes, and machines in datacenters usually have redundant power supplies and batteries+generators backups. And the default filesystem for new installations, ext4, is mounted with auto_da_alloc by default which should make this drop safe, but whether that will result in significant performance gains is IMO something to be tested.

If the measured performance gain makes it interesting to drop the fsync, maybe this could become a configuration item that is set automatically in most cases by detecting the machine type (battery, dual PSU, container, VM => drop fsync) and filesystem (safe fs and mount options => drop fsync) or by asking the user in other cases or in expert install mode, defaulting to the safer --no-force-unsafe-io.

Cheers,


[1]: https://learn.microsoft.com/en-us/windows/win32/fileio/deprecation-of-txf
[2]: https://lwn.net/Articles/351422/
     https://lwn.net/Articles/322823/
     https://lwn.net/Articles/1001770/

--
Julien Plissonneau Duquène

Reply via email to