[ceph-users] Re: MDS crashes to damaged metadata

Stolte, Felix Thu, 15 Dec 2022 06:31:39 -0800

Hi Patrick,

we used your script to repair the damaged objects on the weekend and it went 
smoothly. Thanks for your support.

We adjusted your script to scan for damaged files on a daily basis, runtime is
about 6h. Until thursday last week, we had exactly the same 17 Files. On
thursday at 13:05 a snapshot was created and our active mds crashed once at
this time (snapshot was created):

2022-12-08T13:05:48.919+0100 7f440afec700 -1
/build/ceph-16.2.10/src/mds/ScatterLock.h: In function 'void
ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f440afec700 time
2022-12-08T13:05:48.921223+0100
/build/ceph-16.2.10/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state
LOCK_XLOCK || state LOCK_XLOCKDONE)

12 Minutes lates the unlink_local error crashes appeared again. This time with
a new file. During debugging we noticed a MTU mismatch between MDS (1500) and
client (9000) with cephfs kernel mount. The client is also creating the
snapshots via mkdir in the .snap directory.

We disabled snapshot creation for now, but really need this feature. I uploaded
the mds logs of the first crash along with the information above to
https://tracker.ceph.com/issues/38452

I would greatly appreciate it, if you could answer me the following question:

Is the Bug related to our MTU Mismatch? We fixed the MTU Issue going back to
1500 on all nodes in the ceph public network on the weekend also.

If you need a debug level 20 log of the ScatterLock for further analysis, i
could schedule snapshots at the end of our workdays and increase the debug
level 5 Minutes arround snap shot creation.

Regards
Felix
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------

Am 02.12.2022 um 20:08 schrieb Patrick Donnelly <pdonn...@redhat.com>:

On Thu, Dec 1, 2022 at 5:08 PM Stolte, Felix <f.sto...@fz-juelich.de> wrote:

Script is running for ~2 hours and according to the line count in the memo file
we are at 40% (cephfs is still online).

We had to modify the script putting a try/catch arround the for loop in line 78
to 87. For some reasons there are some objects (186 at this moment) which throw
an UnicodeDecodeError exception during the iteration:

<rados.OmapIterator object at 0x7f9606f8bcf8> Traceback (most recent call
last): File "first-damage.py", line 138, in <module> traverse(f, ioctx) File
"first-damage.py", line 79, in traverse for (dnk, val) in it: File "rados.pyx",
line 1382, in rados.OmapIterator.__next__ File "rados.pyx", line 311, in
rados.decode_cstr UnicodeDecodeError: 'utf-8' codec can't decode bytes in
position 10-11: invalid continuation byte

Don’t know if this is because of the filesystem still running. We saved the
object names in a separate file and i will investigate further tomorrow. We
should be able to modify the script to only check for the objects which threw
the exception instead of searching through the whole pool again.

That shouldn't be caused by teh fs running. It may be you have some
file names which have invalid unicode characters?

Regarding the mds logfiles with debug 20:
We cannot run this debug level for longer than one hour since the logfile size
increase is to high for the local storage on the mds servers where logs are
stored (don’t have a central logging yet).

Okay.

But if you are just interested in the time frame arround the crash, i could set
the debug level to 20, trigger the crash on the weekend and sent you the logs.

The crash is unlikely to point to what causes the corruption. I was
hoping we could locate an instance of damage while the MDS is running.

Regards Felix

---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------

Am 01.12.2022 um 20:51 schrieb Patrick Donnelly <pdonn...@redhat.com>:

On Thu, Dec 1, 2022 at 3:55 AM Stolte, Felix <f.sto...@fz-juelich.de> wrote:

I set debug_mds=20 in ceph.conf and inserted it on the running daemon via "ceph
daemon mds.mon-e2-1 config set debug_mds 20“. I have to check with my
superiors, if i am allowed to provide yout the logs though.

Suggest using `ceph config set` instead of ceph.conf. It's much easier.

Regarding the tool:
<pool> is refering to the cephfs_metadata pool? (just want to be sure)

Yes.

How long will the runs gonna take? We have 15M Objects in our metadata pool and
330M in data pools

Not sure. You can monitor the number of lines generated on the memo
file to get an idea of objects/s.

You can speed test the tool without bringing the file system by
**not** using `--remove`.

Regarding the root cause:
As far as i can tell, all damaged inodes have been only accessed via two samba
servers running with ctdb. We are also running nfs gateways on different
systems, but there hasn’t been a damaged inode (yet).

Samba Servers running Ubuntu 18.04 with kernel 5.4.0-132 and samba version
4.7.6.
Cephfs is accessed via kernel mount and

ceph version is 16.2.10 across all nodes
we have one filesystem and two data pools and using cehpfs snapshots

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS crashes to damaged metadata

Reply via email to