24.12.2024 17:10, Simon Richter wrote:
Hi,
On 12/24/24 18:54, Michael Tokarev wrote:
The no-unsafe-io workaround in dpkg was needed for 2005-era ext2fs
issues, where a power-cut in the middle of filesystem metadata
operation (which dpkg does a lot) might result in in unconsistent
filesystem state.
The thing it protects against is a missing ordering of write() to the contents
of an inode, and a rename() updating the name referring to it.
These are unrelated operations even in other file systems, unless you use data journaling ("data=journaled") to force all operations to the journal,
in order. Normally ("data=ordered") you only get the metadata update marking the data valid after the data has been written, but with no ordering
relative to the file name change.
The order of operation needs to be
1. create .dpkg-new file
2. write data to .dpkg-new file
3. link existing file to .dpkg-old
4. rename .dpkg-new file over final file name
5. clean up .dpkg-old file
When we reach step 4, the data needs to be written to disk and the metadata in the inode referenced by the .dpkg-new file updated, otherwise we
atomically replace the existing file with one that is not yet guaranteed to be written out.
This brings up a question: how dpkg worked before ext2fs started showing this
zero-length files behavior? Iirc it was rather safe, no?
What you're describing seems reasonable. But I wonder if we can do better here.
How about doing steps 1..3 for *all* files in the package, and only
after that, do a single fsync() and do remaining steps 4..5, again,
for all files?
Thanks,
/mjt