Hi Patrick,

we used your script to repair the damaged objects on the weekend and it went 
smoothly. Thanks for your support.

We adjusted your script to scan for damaged files on a daily basis, runtime is 
about 6h. Until thursday last week, we had exactly the same 17 Files. On 
thursday at 13:05 a snapshot was created and our active mds crashed once at 
this time (snapshot was created):

2022-12-08T13:05:48.919+0100 7f440afec700 -1 
/build/ceph-16.2.10/src/mds/ScatterLock.h: In function 'void 
ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f440afec700 time 
2022-12-08T13:05:48.921223+0100
/build/ceph-16.2.10/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state 
LOCK_XLOCK || state LOCK_XLOCKDONE)

12 Minutes lates the unlink_local error crashes appeared again. This time with 
a new file. During debugging we noticed a MTU mismatch between MDS (1500) and 
client (9000) with cephfs kernel mount. The client is also creating the 
snapshots via mkdir in the .snap directory.

We disabled snapshot creation for now, but really need this feature. I uploaded 
the mds logs of the first crash along with the information above to 
https://tracker.ceph.com/issues/38452

I would greatly appreciate it, if you could answer me the following question:

Is the Bug related to our MTU Mismatch? We fixed the MTU Issue going back to 
1500 on all nodes in the ceph public network on the weekend also.

If you need a debug level 20 log of the ScatterLock for further analysis, i 
could schedule snapshots at the end of our workdays and increase the debug 
level 5 Minutes arround snap shot creation.

Regards
Felix
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------

Am 02.12.2022 um 20:08 schrieb Patrick Donnelly <pdonn...@redhat.com>:

On Thu, Dec 1, 2022 at 5:08 PM Stolte, Felix <f.sto...@fz-juelich.de> wrote:

Script is running for ~2 hours and according to the line count in the memo file 
we are at 40% (cephfs is still online).

We had to modify the script putting a try/catch arround the for loop in line 78 
to 87. For some reasons there are some objects (186 at this moment) which throw 
an UnicodeDecodeError exception during the iteration:

<rados.OmapIterator object at 0x7f9606f8bcf8> Traceback (most recent call 
last): File "first-damage.py", line 138, in <module> traverse(f, ioctx) File 
"first-damage.py", line 79, in traverse for (dnk, val) in it: File "rados.pyx", 
line 1382, in rados.OmapIterator.__next__ File "rados.pyx", line 311, in 
rados.decode_cstr UnicodeDecodeError: 'utf-8' codec can't decode bytes in 
position 10-11: invalid continuation byte

Don’t know if this is because of the filesystem still running. We saved the 
object names in a separate file and i will investigate further tomorrow. We 
should be able to modify the script to only check for the objects which threw 
the exception instead of searching through the whole pool again.

That shouldn't be caused by teh fs running. It may be you have some
file names which have invalid unicode characters?

Regarding the mds logfiles with debug 20:
We cannot run this debug level for longer than one hour since the logfile size 
increase is to high for the local storage on the mds servers where logs are 
stored (don’t have a central logging yet).

Okay.

But if you are just interested in the time frame arround the crash, i could set 
the debug level to 20, trigger the crash on the weekend and sent you the logs.

The crash is unlikely to point to what causes the corruption. I was
hoping we could locate an instance of damage while the MDS is running.

Regards Felix


---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------

Am 01.12.2022 um 20:51 schrieb Patrick Donnelly <pdonn...@redhat.com>:

On Thu, Dec 1, 2022 at 3:55 AM Stolte, Felix <f.sto...@fz-juelich.de> wrote:


I set debug_mds=20 in ceph.conf and inserted it on the running daemon via "ceph 
daemon mds.mon-e2-1 config set debug_mds 20“. I have to check with my 
superiors, if i am allowed to provide yout the logs though.


Suggest using `ceph config set` instead of ceph.conf. It's much easier.

Regarding the tool:
<pool> is refering to the cephfs_metadata pool? (just want to be sure)


Yes.

How long will the runs gonna take? We have 15M Objects in our metadata pool and 
330M in data pools


Not sure. You can monitor the number of lines generated on the memo
file to get an idea of objects/s.

You can speed test the tool without bringing the file system by
**not** using `--remove`.

Regarding the root cause:
As far as i can tell, all damaged inodes have been only accessed via two samba 
servers running with ctdb. We are also running nfs gateways on different 
systems, but there hasn’t been a damaged inode (yet).

Samba Servers running Ubuntu 18.04 with kernel 5.4.0-132 and samba version 
4.7.6.
Cephfs is accessed via kernel mount and

ceph version is 16.2.10 across all nodes
we have one filesystem and two data pools and using cehpfs snapshots

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D




--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to