> >cephfs does not create/use object "4.00000000". Please show us some >of its keys. >
https://pastebin.com/WLfLTgni Thanks >On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch <d...@unimelb.edu.au> wrote: >> >> Hi all, >> >> We have a large omap object warning on one of our Ceph clusters. >> The only reports I've seen regarding the "large omap objects" warning from >> other users were related to RGW bucket sharding, however we do not have RGW >> configured on this cluster. >> The large omap object ~10GB resides in a CephFS metadata pool. >> >> It's perhaps worth mentioning that we had to perform disaster recovery steps >> [1] on this cluster last year after a network issue, so we're not sure >> whether this large omap object is a result of those previous recovery >> processes or whether it's completely unrelated. >> >> Ceph version: 12.2.8 >> osd_objectstore: Bluestore >> RHEL 7.5 >> Kernel: 4.4.135-1.el7.elrepo.x86_64 >> >> We have set: "mds_bal_fragment_size_max": "500000" (Default 100000) >> >> $ ceph health detail >> HEALTH_WARN 1 large omap objects >> LARGE_OMAP_OBJECTS 1 large omap objects >> 1 large objects found in pool 'hpcfs_metadata' >> Search the cluster log for 'Large omap object found' for more details. >> >> # Find pg with large omap object >> $ for i in `ceph pg ls-by-pool hpcfs_metadata | tail -n +2 | awk '{print >> $1}'`; do echo -n "$i: "; ceph pg $i query |grep num_large_omap_objects | >> head -1 | awk '{print $2}'; done | grep ": 1" >> 20.103: 1 >> >> # OSD log entry showing relevant object >> osd.143 osd.143 172.26.74.23:6826/3428317 1380 : cluster [WRN] Large omap >> object found. Object: 20:c0ce80d4:::4.00000000:head Key count: 24698995 Size >> (bytes): 11410935690 >> >> # Confirm default warning thresholds for large omap object >> $ ceph daemon osd.143 config show | grep osd_deep_scrub_large_omap >> "osd_deep_scrub_large_omap_object_key_threshold": "2000000", >> "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824", >> >> # Dump keys/values of problematic object, creates 46.65GB file >> $ rados -p hpcfs_metadata listomapvals '4.00000000' > >> /tmp/hpcfs_metadata_object_omap_vals_4.00000000_20190304 >> $ ll /tmp/hpcfs_metadata_object_omap_vals_4.00000000_20190304 >> -rw-r--r-- 1 root root 50089561860 Mar 4 18:16 >> /tmp/hpcfs_metadata_object_omap_vals_4.00000000_20190304 >> >> # Confirm key count matches OSD log entry warning >> $ rados -p hpcfs_metadata listomapkeys '4.00000000' | wc -l >> 24698995 >> >> # The omap keys/vals for that object appear to have been unchanged/static >> for at least a couple of months: >> $ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4.00000000_20190304 >> fd00ceb68607b477626178b2d81fefb926460107 >> /tmp/hpcfs_metadata_object_omap_vals_4.00000000_20190304 >> $ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4_00000000_20190108 >> fd00ceb68607b477626178b2d81fefb926460107 >> /tmp/hpcfs_metadata_object_omap_vals_4_00000000_20190108 >> >> I haven't gone through all 24698995 keys yet, but while most appear to >> relate to objects in the hpcfs_data CephFS data pool, there are a >> significant number of keys (rough guess 25%) that don't appear to have >> corresponding objects in the hpcfs_data pool. >> >> Any assistance or pointers to troubleshoot further would be very much >> appreciated. >> >> Thanks, >> Dylan >> >> [1] http://docs.ceph.com/docs/luminous/cephfs/disaster-recovery/ >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com