Hi Gregor,
Am 11.12.2024 um 12:01 schrieb Gregor Zattler:
Hi,
* Gregor Zattler <telegr...@gmx.net> [2024-12-09; 01:54 +01]:
Dear debian enthusiasts, I use
rdiff-backup, which now is not able to
work with my most precious backup,
instead throws a python backtrace which
contains:
rdiff-backup, I guess, creates and manages its backup history by using
hard links between different "generations" of backup directories.
OSError: [Errno 117] Structure needs cleaning:
b'/mnt/mic-backup/rdiff-backup/durable/rdiff-backup-data/increments/home/grfz/.procmail/backup-post-mailmunge/new'
While a fsck.ext4 -vvvtfDfy on that file
system gives
Failed to optimize directory
/rdiff-backup/durable/rdiff-backup-data/increments/home/grfz/.procmail/backup-post-mailmunge/new
(498074110): Directory block does not have space for checksum
So the file system is sufficiently well filled to not allow further
operations, including fixes.
in Pass 3A: Optimizing directories.
Because of
https://blogs.oracle.com/linux/post/space-management-with-large-directories-in-ext4
I tried to copy said directory:
cp -a new neu
this too does not work:
cp: cannot access 'new': Structure needs cleaning
cp: preserving times for 'neu': Read-only file system
Any ideas how to repair said directory,
This does not look like there will be an easy solution.
clean the structure, or another work
around to at least get rdiff-backup get
to use the backup again for restoring?
I know nothing about rdiff-backup. Also, if fsck can not do its work any
more due to limitations or constraints, you will probably have to make a
tough decision:
- Ask some *real* experts. In this case, probably ext file system
developers, or a professional data recovery company.
- try to salvage what you can by manually copying data out
- give up, wipe the complete file system and take a note to avoid such
situations in the future.
Or where to ask? The ext3-users
mailing list does not exist any more?
Linux kernel mailing lists might help, at least to ask where to find the
ext family developers.
Or how to avoid such a problem next time?
Avoid excessive amounts of hard links.
The problem is to know what the practical limits are.
Unfortunately, you will usually only know *after* hitting some wall.
this directory in question is really
huge:
ls -Altr
[...]
-rwx-----x 1 grfz grfz 0 Nov 21 01:26
new.2024-11-21T01:32:48+01:00.dir
-rwx------ 1 grfz grfz 0 Nov 21 16:37
tmp.2024-11-21T16:42:36+01:00.dir
-rwx-----x 1 grfz grfz 0 Nov 21 16:37
new.2024-11-21T16:42:36+01:00.dir
-rwx------ 1 grfz grfz 0 Nov 21 23:12
tmp.2024-11-21T23:36:57+01:00.dir
-rwx-----x 1 grfz grfz 0 Nov 21 23:12
new.2024-11-21T23:36:57+01:00.dir
-rwx------ 1 grfz grfz 0 Nov 23 01:01
tmp.2024-11-23T01:25:22+01:00.dir
-rwx-----x 1 grfz grfz 0 Nov 23 01:01
new.2024-11-23T01:25:22+01:00.dir
-rwx------ 1 grfz grfz 0 Nov 25 00:41
tmp.2024-11-25T00:44:07+01:00.dir
-rwx-----x 1 grfz grfz 0 Nov 25 00:41
new.2024-11-25T00:44:07+01:00.dir
-rwx------ 1 grfz grfz 0 Nov 25 15:54
tmp.2024-11-25T15:56:34+01:00.dir
-rwx-----x 1 grfz grfz 0 Nov 25 15:54
new.2024-11-25T15:56:34+01:00.dir
-rwx------ 1 grfz grfz 0 Nov 26 00:27
tmp.2024-11-26T00:50:56+01:00.dir
-rwx-----x 1 grfz grfz 0 Nov 26 00:27
new.2024-11-26T00:50:56+01:00.dir
-rwx-----x 1 grfz grfz 0 Nov 28 01:06
new.2024-11-28T01:34:32+01:00.dir
drwx------ 2 root root 2139340800 Nov 28 01:37 new
It would be very interesting to understand the inode allocation and use
through hardlinks, but I suspect you will not find user friendly tools
such as ls, du, df, stat or fsck able to help here. The problem is not
with those tools, but the file system code itself. Which usually also
implies that you have no way to run any automatic recovery at all any more.
You might be able to get somewhere using debugfs, but I have no
experience with it, so can not even give you a starting point.
I tried to move the files in the
directory to another one, but this gives
mv: cannot stat FILENAME: Bad message
So I cannot stat, mv, cp, cat these
files or at least some of them.
As stated above, I'm not surprised.
dmesg shows 35 lines like this
one:
ext4_dirblock_csum_verify:405: inode #498074110: comm ls: No space for
directory leaf checksum. Please run e2fsck -D
and this one:
[76268.580904] EXT4-fs error (device dm-6): ext4_readdir:218: inode #498074110:
comm ls: path (unknown): directory fails checksum at offset 0
So pretty clearly some file system structure have run out of space and
more space can not be allocated. Given that the first message is about
lacking space for a checksum, I would try disabling all checksumming and
see if that allows you to recover some files.
But fsck -fDy does not help (as stated
in quoted part, above).
After fsck.ext4 -vvvtfDfy a fsck without
any options tells me the fs is clean.
Any ideas, pointers?
What I would do is the following.
1. Find storage to store a full dump of the file system (or, actually,
its underlying block storage). Create such a dump, because it's quite
likely that further problems arise. Dump in this context means dd, not
the file system dump tool.
2. Familiarize yourself with ext file system options and debugfs tool.
Eventually, see if you can access the file system without any features
that would add extra storage capacity or inode needs. Like without
journal, without checksumming, without any redundancy and in particular
read only.
3. See if you can at least read some files. If you can, copy them
somewhere else.
4. Try to delete copied files -- either by mounting RW and using regular
rm, or using debugging tools. At this point, you'll not expect that a
deleted file will actually free up visible space (it might just be a
hard link...) but you may have a chance to free up some of the directory
or other data structures.
5. Repeat until you have noticeable free space, both bulk capacity and
inodes. You can probably not check with regular tools, so this is more
about debugging the file system or trusting your gut feeling.
6. Eventually, try an fsck run. If it fails, go back to step 3.
This will be a time consuming and nerve wrecking exercise.
If that works, you'll lose most of the structure and metadata
represented as file system contents, in particular everything that is in
the hard link relationships of the directory tree.
Afterwards, it's time to investigate how you should have tuned the file
system for its use case, or how you could at least have monitored for
its reaching operational boundaries. Perhaps it's also useful to use a
different file system for such purposes.
Personally, I would recommend against using such hardlink based backup
schemes in general, because they are too fragile and can not be fixed.
Which you just found out.
Oh, and only after you have exhausted your options, implemented and
monitored an improved or different backup solution, and proved that you
can restore is it time to delete the last copy of your broken file system.
Good luck.
Arno
Ciao; Gregor
--
-... --- .-. . -.. ..--.. ...-.-
--
Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück