Re: [ceph-users] CephFS - large omap object

Dylan McCulloch Mon, 18 Mar 2019 06:50:19 -0700

>please run following command. It will show where is 4.00000000
>
>rados -p -p hpcfs_metadata getxattr 4.00000000 parent >/tmp/parent
>ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json
>


$ ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json
{
    "ino": 4,
    "ancestors": [
        {
            "dirino": 1,
            "dname": "lost+found",
            "version": 1
        }
    ],
    "pool": 20,
    "old_pools": []
}

I guess it may have a very large number of files from previous recovery 
operations?

>On Mon, Mar 18, 2019 at 8:15 PM Dylan McCulloch <d...@unimelb.edu.au> wrote:
>>
>> >> >> >cephfs does not create/use object "4.00000000".  Please show us some
>> >> >> >of its keys.
>> >> >> >
>> >> >>
>> >> >> https://pastebin.com/WLfLTgni
>> >> >> Thanks
>> >> >>
>> >> > Is the object recently modified?
>> >> >
>> >> >rados -p hpcfs_metadata stat 4.00000000
>> >> >
>> >>
>> >> $ rados -p hpcfs_metadata stat 4.00000000
>> >> hpcfs_metadata/4.00000000 mtime 2018-09-17 08:11:50.000000, size 0
>> >>
>> >please check if 4.00000000 has omap header and xattrs
>> >
>> >rados -p hpcfs_data listxattr 4.00000000
>> >
>> >rados -p hpcfs_data getomapheader 4.00000000
>> >
>>
>> Not sure if that was a typo^^ and you would like the above commands run on 
>> the 4.00000000 object in the metadata pool.
>> Ran commands on both
>>
>> $ rados -p hpcfs_data listxattr 4.00000000
>> error getting xattr set hpcfs_data/4.00000000: (2) No such file or directory
>> $ rados -p hpcfs_data getomapheader 4.00000000
>> error getting omap header hpcfs_data/4.00000000: (2) No such file or 
>> directory
>>
>> $ rados -p hpcfs_metadata listxattr 4.00000000
>> layout
>> parent
>> $ rados -p hpcfs_metadata getomapheader 4.00000000
>> header (274 bytes) :
>> 00000000  04 03 0c 01 00 00 01 00  00 00 00 00 00 00 00 00  
>> |................|
>> 00000010  00 00 00 00 00 00 03 02  28 00 00 00 00 00 00 00  
>> |........(.......|
>> 00000020  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00  
>> |................|
>> 00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> |................|
>> 00000040  00 00 00 00 03 02 28 00  00 00 00 00 00 00 00 00  
>> |......(.........|
>> 00000050  00 00 00 00 00 00 00 00  00 00 01 00 00 00 00 00  
>> |................|
>> 00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> |................|
>> 00000070  00 00 03 02 38 00 00 00  00 00 00 00 00 00 00 00  
>> |....8...........|
>> 00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> |................|
>> *
>> 000000b0  03 02 38 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> |..8.............|
>> 000000c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> |................|
>> *
>> 000000e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 01 00  
>> |................|
>> 000000f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> |................|
>> *
>> 00000110  00 00                                             |..|
>> 00000112
>>
>> $ rados -p hpcfs_metadata getxattr 4.00000000 layout
>> ????????
>> $ rados -p hpcfs_metadata getxattr 4.00000000 parent
>> <
>> lost+found
>>
>> >> >> >On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch <d...@unimelb.edu.au> 
>> >> >> >wrote:
>> >> >> >>
>> >> >> >> Hi all,
>> >> >> >>
>> >> >> >> We have a large omap object warning on one of our Ceph clusters.
>> >> >> >> The only reports I've seen regarding the "large omap objects" 
>> >> >> >> warning from other users were related to RGW bucket sharding, 
>> >> >> >> however we do not have RGW configured on this cluster.
>> >> >> >> The large omap object ~10GB resides in a CephFS metadata pool.
>> >> >> >>
>> >> >> >> It's perhaps worth mentioning that we had to perform disaster 
>> >> >> >> recovery steps [1] on this cluster last year after a network issue, 
>> >> >> >> so we're not sure whether this large omap object is a result of 
>> >> >> >> those previous recovery processes or whether it's completely 
>> >> >> >> unrelated.
>> >> >> >>
>> >> >> >> Ceph version: 12.2.8
>> >> >> >> osd_objectstore: Bluestore
>> >> >> >> RHEL 7.5
>> >> >> >> Kernel: 4.4.135-1.el7.elrepo.x86_64
>> >> >> >>
>> >> >> >> We have set: "mds_bal_fragment_size_max": "500000" (Default 100000)
>> >> >> >>
>> >> >> >> $ ceph health detail
>> >> >> >> HEALTH_WARN 1 large omap objects
>> >> >> >> LARGE_OMAP_OBJECTS 1 large omap objects
>> >> >> >>     1 large objects found in pool 'hpcfs_metadata'
>> >> >> >>     Search the cluster log for 'Large omap object found' for more 
>> >> >> >> details.
>> >> >> >>
>> >> >> >> # Find pg with large omap object
>> >> >> >> $ for i in `ceph pg ls-by-pool hpcfs_metadata | tail -n +2 | awk 
>> >> >> >> '{print $1}'`; do echo -n "$i: "; ceph pg $i query |grep 
>> >> >> >> num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 
>> >> >> >> 1"
>> >> >> >> 20.103: 1
>> >> >> >>
>> >> >> >> # OSD log entry showing relevant object
>> >> >> >> osd.143 osd.143 172.26.74.23:6826/3428317 1380 : cluster [WRN] 
>> >> >> >> Large omap object found. Object: 20:c0ce80d4:::4.00000000:head Key 
>> >> >> >> count: 24698995 Size (bytes): 11410935690
>> >> >> >>
>> >> >> >> # Confirm default warning thresholds for large omap object
>> >> >> >> $ ceph daemon osd.143 config show | grep osd_deep_scrub_large_omap
>> >> >> >>     "osd_deep_scrub_large_omap_object_key_threshold": "2000000",
>> >> >> >>     "osd_deep_scrub_large_omap_object_value_sum_threshold": 
>> >> >> >> "1073741824",
>> >> >> >>
>> >> >> >> # Dump keys/values of problematic object, creates 46.65GB file
>> >> >> >> $ rados -p hpcfs_metadata listomapvals '4.00000000' > 
>> >> >> >> /tmp/hpcfs_metadata_object_omap_vals_4.00000000_20190304
>> >> >> >> $ ll /tmp/hpcfs_metadata_object_omap_vals_4.00000000_20190304
>> >> >> >> -rw-r--r-- 1 root root 50089561860 Mar  4 18:16 
>> >> >> >> /tmp/hpcfs_metadata_object_omap_vals_4.00000000_20190304
>> >> >> >>
>> >> >> >> # Confirm key count matches OSD log entry warning
>> >> >> >> $ rados -p hpcfs_metadata listomapkeys '4.00000000' | wc -l
>> >> >> >> 24698995
>> >> >> >>
>> >> >> >> # The omap keys/vals for that object appear to have been 
>> >> >> >> unchanged/static for at least a couple of months:
>> >> >> >> $ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4.00000000_20190304
>> >> >> >> fd00ceb68607b477626178b2d81fefb926460107  
>> >> >> >> /tmp/hpcfs_metadata_object_omap_vals_4.00000000_20190304
>> >> >> >> $ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4_00000000_20190108
>> >> >> >> fd00ceb68607b477626178b2d81fefb926460107  
>> >> >> >> /tmp/hpcfs_metadata_object_omap_vals_4_00000000_20190108
>> >> >> >>
>> >> >> >> I haven't gone through all 24698995 keys yet, but while most appear 
>> >> >> >> to relate to objects in the hpcfs_data CephFS data pool, there are 
>> >> >> >> a significant number of keys (rough guess 25%) that don't appear to 
>> >> >> >> have corresponding objects in the hpcfs_data pool.
>> >> >> >>
>> >> >> >> Any assistance or pointers to troubleshoot further would be very 
>> >> >> >> much appreciated.
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> Dylan
>> >> >> >>
>> >> >> >> [1] http://docs.ceph.com/docs/luminous/cephfs/disaster-recovery/
>> >> >> >>
>> >> >> >> _______________________________________________
>> >> >> >> ceph-users mailing list
>> >> >> >> ceph-users@lists.ceph.com
>> >> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS - large omap object

Reply via email to