Re: A 2025 NewYear present: make dpkg --force-unsafe-io the default?

Simon Richter Tue, 24 Dec 2024 06:10:55 -0800

Hi,

On 12/24/24 18:54, Michael Tokarev wrote:

The no-unsafe-io workaround in dpkg was needed for 2005-era ext2fs
issues, where a power-cut in the middle of filesystem metadata
operation (which dpkg does a lot) might result in in unconsistent
filesystem state.

The thing it protects against is a missing ordering of write() to thecontents of an inode, and a rename() updating the name referring to it.

These are unrelated operations even in other file systems, unless youuse data journaling ("data=journaled") to force all operations to thejournal, in order. Normally ("data=ordered") you only get the metadataupdate marking the data valid after the data has been written, but withno ordering relative to the file name change.


The order of operation needs to be

1. create .dpkg-new file
2. write data to .dpkg-new file
3. link existing file to .dpkg-old
4. rename .dpkg-new file over final file name
5. clean up .dpkg-old file

When we reach step 4, the data needs to be written to disk and themetadata in the inode referenced by the .dpkg-new file updated,otherwise we atomically replace the existing file with one that is notyet guaranteed to be written out.


We get two assurances from the file system here:

1. the file will not contain garbage data -- the number of bytes markedvalid will be less than or equal to the number of bytes actuallywritten. The number of valid bytes will be zero initially, and onlyafter data has been written out, the metadata update to change it to thefinal value is added to the journal.

2. creation of the inode itself will be written into the journal beforethe rename operation, so the file never vanishes.

What this does not protect against is the file pointing to a zero-sizeinode. The only way to avoid that is either data journaling, which ishorribly slow and creates extra writes, or fsync().

Today, doing an fsync() really hurts, - with SSDs/flash it reduces
the lifetime of the storage, for many modern filesystems it is a
costly operation which bloats the metadata tree significantly,
resulting in all further operations becomes inefficient.

This should not make any difference in the number of write operationsnecessary, and only affect ordering. The data, metadata journal andmetadata update still have to be written.

The only way this could be improved is with a filesystem leveltransaction, where we can ask the file system to perform the entireupdate atomically -- then all the metadata updates can be queued in RAM,held back until the data has been synchronized by the kernel in thebackground, and then added to the journal in one go. I would expect thatwith such a file system, fsync() becomes cheap, because it would just beadded to the transaction, and if the kernel gets around to writing thedata before the entire transaction is synchronized at the end, itbecomes a no-op.

This assumes that maintainer scripts can be included in the transaction(otherwise we need to flush the transaction before invoking a maintainerscript), and that no external process records the successful executionand expects it to be persisted (apt makes no such assumption, because itreads the dpkg status, so this is safe, but e.g. puppet might becomeconfused if an operation it marked as successful is rolled back by apower loss).

What could make sense is more aggressively promoting this option forcontainers and similar throwaway installations where there is aguarantee that a power loss will have the entire workspace thrown away,such as when working in a CI environment.

However, even that is not guaranteed: if I create a Docker image forreuse, Docker will mark the image creation as successful when thecommand returns. Again, there is no ordering guarantee between thecontainer contents and the database recording the success of theoperation outside.


So no, we cannot drop the fsync(). :\

   Simon

Re: A 2025 NewYear present: make dpkg --force-unsafe-io the default?

Reply via email to