August 9, 2025 at 12:27 AM, "Milos Nikic" <nikic.mi...@gmail.com 
mailto:nikic.mi...@gmail.com?to=%22Milos%20Nikic%22%20%3Cnikic.milos%40gmail.com%3E
 > wrote:



> 
> Hi all,
> 
> Since my last email about the journaling subsystem, I’ve done a lot of work 
> on it.
> Here’s what’s new over the past few weeks:
> 
> **Integration & safety**
> 
> Journaling now writes directly to raw disk space *outside* ext2fs-managed 
> blocks.
> No more feedback loops or early boot write limitations.
> 
> journal_init() now reads its configuration (offset, size, etc.) directly from 
> four reserved fields in the ext2 superblock.
> If those fields are unset or invalid, journaling stays off and all hooks are 
> no-ops.
> 
> Hooks into libdiskfs remain minimal and isolated; core paths are unchanged 
> when journaling is disabled.
> 
> **Filtering & replay improvements**
> 
> - Added a dedicated policy module to filter out noisy events (/tmp, build 
> outputs, etc.).
> 
> - Stronger inode fingerprinting to prevent misapplying updates after inode 
> reuse.
> 
> - Replay is now dual-path: inode-based first, falling back to path-based when 
> needed. 
> 
> - “Best effort” file recreation under /restore/[timestamp] with correct 
> metadata when files vanish after a crash.
> 
> **Two tricky problems took significant work:**
> 
> 1. 
> 
> **Path recovery:** cred->po->path often gives useful file paths, but 
> sometimes needs sanitizing or is imprecise. Combined with the current name, 
> it’s often enough to reconstruct missing files. Replay now uses path-based 
> recovery when inode-based recovery fails.
> 
> 2. 
> 
> **Aggressive inode reuse in ext2:** After deletion (say at fsck time, or any 
> time really) the same inode number may be reassigned to a completely 
> different file after reboot. Fingerprinting ensures we never apply stale 
> updates to the wrong file.
> 
> **Testing & results**
> 
> - Survived repeated hard reboots under concurrent create/delete stress.
> 
> - In chaos tests where fsck over-deleted files, journaling replay brought 
> them back as expected.
> 
> **Other changes**
> 
> - Removed unused async paths, watchers, and threads — code size is still 
> larger than before, but cleaner.
> 
> - Memory use during replay is controlled via fixed-size arenas.
> 
> **Important scope note**
> 
> This is **not** a replacement for fsck, ext4-style transactions, or a strong 
> consistency guarantee
> 
> It’s a *best-effort*, *do-no-harm* crash-recovery helper that complements 
> fsck by restoring metadata and paths opportunistically.
> 
> When disabled or misconfigured, it is inert and has zero impact on normal 
> operation.
> 
> **Future work ideas**
> 
> - Better path preservation to improve replay accuracy.
> 
> - Per-node timelines for smarter change grouping.
> 
> - Integration with ext tooling to support formatting with journaling fields 
> and an 8 MiB carve-out.
> 
> - Exporting replay stats via /proc-like interface.
> 
> This patch is large (~4.6 kLOC) but self-contained — most of it is in new 
> libdiskfs/journal_*.c files.
> If preferred, I can break it into a smaller series.
> 
> Let me know your thoughts!
> 
> Thanks,

This is awesome!  Thanks for the contribution!

Is this still the best guide for how to use your best effort journal?  

https://lists.gnu.org/archive/html/bug-hurd/2025-07/msg00048.html

Reply via email to