On Thu, Dec 15, 2022 at 9:32 AM Stolte, Felix <f.sto...@fz-juelich.de> wrote:
>
> Hi Patrick,
>
> we used your script to repair the damaged objects on the weekend and it went 
> smoothly. Thanks for your support.
>
> We adjusted your script to scan for damaged files on a daily basis, runtime 
> is about 6h. Until thursday last week, we had exactly the same 17 Files. On 
> thursday at 13:05 a snapshot was created and our active mds crashed once at 
> this time (snapshot was created):
>
> 2022-12-08T13:05:48.919+0100 7f440afec700 -1 
> /build/ceph-16.2.10/src/mds/ScatterLock.h: In function 'void 
> ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f440afec700 time 
> 2022-12-08T13:05:48.921223+0100
> /build/ceph-16.2.10/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state 
> LOCK_XLOCK || state LOCK_XLOCKDONE)
>
> 12 Minutes lates the unlink_local error crashes appeared again. This time 
> with a new file. During debugging we noticed a MTU mismatch between MDS 
> (1500) and client (9000) with cephfs kernel mount. The client is also 
> creating the snapshots via mkdir in the .snap directory.
>
> We disabled snapshot creation for now, but really need this feature. I 
> uploaded the mds logs of the first crash along with the information above to 
> https://tracker.ceph.com/issues/38452
>
> I would greatly appreciate it, if you could answer me the following question:
>
> Is the Bug related to our MTU Mismatch? We fixed the MTU Issue going back to 
> 1500 on all nodes in the ceph public network on the weekend also.

I doubt it.

> If you need a debug level 20 log of the ScatterLock for further analysis, i 
> could schedule snapshots at the end of our workdays and increase the debug 
> level 5 Minutes arround snap shot creation.

This would be very helpful!

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to