Hi Jason, >> Was the base RBD pool used only for data-pool associated images Yes, it is only used for storing metadata of ecpool.
We use 2 pools for erasure coding ecpool - erasure coded datapool vm - replicated pool to store metadata Karun Josy On Tue, Jan 30, 2018 at 8:00 PM, Jason Dillaman <[email protected]> wrote: > Unfortunately, any snapshots created prior to 12.2.2 against a separate > data pool were incorrectly associated to the base image pool instead of the > data pool. Was the base RBD pool used only for data-pool associated images > (i.e. all the snapshots that exists within the pool can be safely deleted)? > > On Mon, Jan 29, 2018 at 11:50 AM, Karun Josy <[email protected]> wrote: > >> >> The problem we are experiencing is described here: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1497332 >> >> However, we are running 12.2.2. >> >> Across our 6 ceph clusters, this one with the problem was first version >> 12.2.0, then upgraded to .1 and then to .2. >> >> The other 5 ceph installations started as version 12.2.1 and then updated >> to .2. >> >> Karun Josy >> >> On Mon, Jan 29, 2018 at 7:01 PM, Karun Josy <[email protected]> wrote: >> >>> Thank you for your response. >>> >>> We don't think there is an issue with the cluster being behind snap >>> trimming. We just don't think snaptrim is occurring at all. >>> >>> We have 6 individual ceph clusters. When we delete old snapshots for >>> clients, we can see space being made available. In this particular one >>> however, with 300 virtual machines, 28TBs of data (this is our largest >>> ceph), I can delete hundreds of snapshots, and not a single gigabyte >>> becomes available after doing that. >>> >>> In our other 5, smaller Ceph clusters, we can see hundreds of gigabytes >>> becoming available again after doing massive deletions of snapshots. >>> >>> The Luminous gui also never shows "snaptrimming" occurring in the EC >>> pool. While the other 5 Luminous clusters, their GUI will show >>> snaptrimming occurring for the EC pool. Within minutes we can see the >>> additional space becoming available. >>> >>> This isn't an issue of the trimming queue behind schedule. The system >>> shows there is no trimming scheduled in the queue, ever. >>> >>> However, when using ceph du on particular virtual machines, we can see >>> that snapshots we delete are indeed no longer listed in ceph du's output. >>> >>> So, they seem to be deleting. But the space is not being reclaimed. >>> >>> All clusters are same hardware. Some have more disks and servers than >>> others. The only major difference is that this particular Ceph with this >>> problem, it had the noscrub and nodeep-scrub flags set for many weeks. >>> >>> >>> Karun Josy >>> >>> On Mon, Jan 29, 2018 at 6:27 PM, David Turner <[email protected]> >>> wrote: >>> >>>> I don't know why you keep asking the same question about snap trimming. >>>> You haven't shown any evidence that your cluster is behind on that. Have >>>> you looked into fstrim inside of your VMs? >>>> >>>> On Mon, Jan 29, 2018, 4:30 AM Karun Josy <[email protected]> wrote: >>>> >>>>> fast-diff map is not enabled for RBD images. >>>>> Can it be a reason for Trimming not happening ? >>>>> >>>>> Karun Josy >>>>> >>>>> On Sat, Jan 27, 2018 at 10:19 PM, Karun Josy <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi David, >>>>>> >>>>>> Thank you for your reply! I really appreciate it. >>>>>> >>>>>> The images are in pool id 55. It is an erasure coded pool. >>>>>> >>>>>> --------------- >>>>>> $ echo $(( $(ceph pg 55.58 query | grep snap_trimq | cut -d[ -f2 | >>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>> 0 >>>>>> $ echo $(( $(ceph pg 55.a query | grep snap_trimq | cut -d[ -f2 | >>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>> 0 >>>>>> $ echo $(( $(ceph pg 55.65 query | grep snap_trimq | cut -d[ -f2 | >>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>> 0 >>>>>> -------------- >>>>>> >>>>>> Current snap_trim_sleep value is default. >>>>>> "osd_snap_trim_sleep": "0.000000". I assume it means there is no >>>>>> delay. (Can't find any documentation related to it) >>>>>> Will changing its value initiate snaptrimming, like >>>>>> ceph tell osd.* injectargs '--osd_snap_trim_sleep 0.05' >>>>>> >>>>>> Also, we are using an rbd user with the below profile. It is used >>>>>> while deleting snapshots >>>>>> ------- >>>>>> caps: [mon] profile rbd >>>>>> caps: [osd] profile rbd pool=ecpool, profile rbd pool=vm, >>>>>> profile rbd-read-only pool=templates >>>>>> ------- >>>>>> >>>>>> Can it be a reason ? >>>>>> >>>>>> Also, can you let me know which all logs to check while deleting >>>>>> snapshots to see if it is snaptrimming ? >>>>>> I am sorry I feel like pestering you too much. >>>>>> But in mailing lists, I can see you have dealt with similar issues >>>>>> with Snapshots >>>>>> So I think you can help me figure this mess out. >>>>>> >>>>>> >>>>>> Karun Josy >>>>>> >>>>>> On Sat, Jan 27, 2018 at 7:15 PM, David Turner <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Prove* a positive >>>>>>> >>>>>>> On Sat, Jan 27, 2018, 8:45 AM David Turner <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Unless you have things in your snap_trimq, your problem isn't snap >>>>>>>> trimming. That is currently how you can check snap trimming and you say >>>>>>>> you're caught up. >>>>>>>> >>>>>>>> Are you certain that you are querying the correct pool for the >>>>>>>> images you are snapshotting. You showed that you tested 4 different >>>>>>>> pools. >>>>>>>> You should only need to check the pool with the images you are dealing >>>>>>>> with. >>>>>>>> >>>>>>>> You can inversely price a positive by changing your snap_trim >>>>>>>> settings to not do any cleanup and see if the appropriate PGs have >>>>>>>> anything >>>>>>>> in their q. >>>>>>>> >>>>>>>> On Sat, Jan 27, 2018, 12:06 AM Karun Josy <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Is scrubbing and deep scrubbing necessary for Snaptrim operation >>>>>>>>> to happen ? >>>>>>>>> >>>>>>>>> Karun Josy >>>>>>>>> >>>>>>>>> On Fri, Jan 26, 2018 at 9:29 PM, Karun Josy <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thank you for your quick response! >>>>>>>>>> >>>>>>>>>> I used the command to fetch the snap_trimq from many pgs, however >>>>>>>>>> it seems they don't have any in queue ? >>>>>>>>>> >>>>>>>>>> For eg : >>>>>>>>>> ==================== >>>>>>>>>> $ echo $(( $(ceph pg 55.4a query | grep snap_trimq | cut -d[ -f2 >>>>>>>>>> | cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>>>>>> 0 >>>>>>>>>> $ echo $(( $(ceph pg 55.5a query | grep snap_trimq | cut -d[ -f2 >>>>>>>>>> | cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>>>>>> 0 >>>>>>>>>> $ echo $(( $(ceph pg 55.88 query | grep snap_trimq | cut -d[ -f2 >>>>>>>>>> | cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>>>>>> 0 >>>>>>>>>> $ echo $(( $(ceph pg 55.55 query | grep snap_trimq | cut -d[ -f2 >>>>>>>>>> | cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>>>>>> 0 >>>>>>>>>> $ echo $(( $(ceph pg 54.a query | grep snap_trimq | cut -d[ -f2 >>>>>>>>>> | cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>>>>>> 0 >>>>>>>>>> $ echo $(( $(ceph pg 34.1d query | grep snap_trimq | cut -d[ -f2 >>>>>>>>>> | cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>>>>>> 0 >>>>>>>>>> $ echo $(( $(ceph pg 1.3f query | grep snap_trimq | cut -d[ -f2 >>>>>>>>>> | cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>>>>>> 0 >>>>>>>>>> ===================== >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> While going through the PG query, I find that these PGs have no >>>>>>>>>> value in purged_snaps section too. >>>>>>>>>> For eg : >>>>>>>>>> ceph pg 55.80 query >>>>>>>>>> -- >>>>>>>>>> --- >>>>>>>>>> --- >>>>>>>>>> { >>>>>>>>>> "peer": "83(3)", >>>>>>>>>> "pgid": "55.80s3", >>>>>>>>>> "last_update": "43360'15121927", >>>>>>>>>> "last_complete": "43345'15073146", >>>>>>>>>> "log_tail": "43335'15064480", >>>>>>>>>> "last_user_version": 15066124, >>>>>>>>>> "last_backfill": "MAX", >>>>>>>>>> "last_backfill_bitwise": 1, >>>>>>>>>> "purged_snaps": [], >>>>>>>>>> "history": { >>>>>>>>>> "epoch_created": 5950, >>>>>>>>>> "epoch_pool_created": 5950, >>>>>>>>>> "last_epoch_started": 43339, >>>>>>>>>> "last_interval_started": 43338, >>>>>>>>>> "last_epoch_clean": 43340, >>>>>>>>>> "last_interval_clean": 43338, >>>>>>>>>> "last_epoch_split": 0, >>>>>>>>>> "last_epoch_marked_full": 42032, >>>>>>>>>> "same_up_since": 43338, >>>>>>>>>> "same_interval_since": 43338, >>>>>>>>>> "same_primary_since": 43276, >>>>>>>>>> "last_scrub": "35299'13072533", >>>>>>>>>> "last_scrub_stamp": "2018-01-18 14:01:19.557972", >>>>>>>>>> "last_deep_scrub": "31372'12176860", >>>>>>>>>> "last_deep_scrub_stamp": "2018-01-15 >>>>>>>>>> 12:21:17.025305", >>>>>>>>>> "last_clean_scrub_stamp": "2018-01-18 >>>>>>>>>> 14:01:19.557972" >>>>>>>>>> }, >>>>>>>>>> >>>>>>>>>> Not sure if it is related. >>>>>>>>>> >>>>>>>>>> The cluster is not open to any new clients. However we see a >>>>>>>>>> steady growth of space usage every day. >>>>>>>>>> And worst case scenario, it might grow faster than we can add >>>>>>>>>> more space, which will be dangerous. >>>>>>>>>> >>>>>>>>>> Any help is really appreciated. >>>>>>>>>> >>>>>>>>>> Karun Josy >>>>>>>>>> >>>>>>>>>> On Fri, Jan 26, 2018 at 8:23 PM, David Turner < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> "snap_trimq": "[]", >>>>>>>>>>> >>>>>>>>>>> That is exactly what you're looking for to see how many objects >>>>>>>>>>> a PG still had that need to be cleaned up. I think something like >>>>>>>>>>> this >>>>>>>>>>> should give you the number of objects in the snap_trimq for a PG. >>>>>>>>>>> >>>>>>>>>>> echo $(( $(ceph pg $pg query | grep snap_trimq | cut -d[ -f2 | >>>>>>>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>>>>>>> >>>>>>>>>>> Note, I'm not at a computer and topping this from my phone so >>>>>>>>>>> it's not pretty and I know of a few ways to do that better, but >>>>>>>>>>> that should >>>>>>>>>>> work all the same. >>>>>>>>>>> >>>>>>>>>>> For your needs a visual inspection of several PGs should be >>>>>>>>>>> sufficient to see if there is anything in the snap_trimq to begin >>>>>>>>>>> with. >>>>>>>>>>> >>>>>>>>>>> On Fri, Jan 26, 2018, 9:18 AM Karun Josy <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi David, >>>>>>>>>>>> >>>>>>>>>>>> Thank you for the response. To be honest, I am afraid it is >>>>>>>>>>>> going to be a issue in our cluster. >>>>>>>>>>>> It seems snaptrim has not been going on for sometime now , >>>>>>>>>>>> maybe because we were expanding the cluster adding nodes for the >>>>>>>>>>>> past few >>>>>>>>>>>> weeks. >>>>>>>>>>>> >>>>>>>>>>>> I would be really glad if you can guide me how to overcome this. >>>>>>>>>>>> Cluster has about 30TB data and 11 million objects. With about >>>>>>>>>>>> 100 disks spread across 16 nodes. Version is 12.2.2 >>>>>>>>>>>> Searching through the mailing lists I can see many cases where >>>>>>>>>>>> the performance were affected while snaptrimming. >>>>>>>>>>>> >>>>>>>>>>>> Can you help me figure out these : >>>>>>>>>>>> >>>>>>>>>>>> - How to find snaptrim queue of a PG. >>>>>>>>>>>> - Can snaptrim be started just on 1 PG >>>>>>>>>>>> - How can I make sure cluster IO performance is not affected ? >>>>>>>>>>>> I read about osd_snap_trim_sleep , how can it be changed ? >>>>>>>>>>>> Is this the command : ceph tell osd.* injectargs >>>>>>>>>>>> '--osd_snap_trim_sleep 0.005' >>>>>>>>>>>> >>>>>>>>>>>> If yes what is the recommended value that we can use ? >>>>>>>>>>>> >>>>>>>>>>>> Also, what all parameters should we be concerned about? I would >>>>>>>>>>>> really appreciate any suggestions. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Below is a brief extract of a PG queried >>>>>>>>>>>> ---------------------------- >>>>>>>>>>>> ceph pg 55.77 query >>>>>>>>>>>> { >>>>>>>>>>>> "state": "active+clean", >>>>>>>>>>>> "snap_trimq": "[]", >>>>>>>>>>>> --- >>>>>>>>>>>> ---- >>>>>>>>>>>> >>>>>>>>>>>> "pgid": "55.77s7", >>>>>>>>>>>> "last_update": "43353'17222404", >>>>>>>>>>>> "last_complete": "42773'16814984", >>>>>>>>>>>> "log_tail": "42763'16812644", >>>>>>>>>>>> "last_user_version": 16814144, >>>>>>>>>>>> "last_backfill": "MAX", >>>>>>>>>>>> "last_backfill_bitwise": 1, >>>>>>>>>>>> "purged_snaps": [], >>>>>>>>>>>> "history": { >>>>>>>>>>>> "epoch_created": 5950, >>>>>>>>>>>> --- >>>>>>>>>>>> --- >>>>>>>>>>>> --- >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Karun Josy >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jan 26, 2018 at 6:36 PM, David Turner < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> You may find the information in this ML thread useful. >>>>>>>>>>>>> https://www.spinics.net/lists/ceph-users/msg41279.html >>>>>>>>>>>>> >>>>>>>>>>>>> It talks about a couple ways to track your snaptrim queue. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jan 26, 2018 at 2:09 AM Karun Josy < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> We have set no scrub , no deep scrub flag on a ceph cluster. >>>>>>>>>>>>>> When we are deleting snapshots we are not seeing any change >>>>>>>>>>>>>> in usage space. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I understand that Ceph OSDs delete data asynchronously, so >>>>>>>>>>>>>> deleting a snapshot doesn’t free up the disk space immediately. >>>>>>>>>>>>>> But we are >>>>>>>>>>>>>> not seeing any change for sometime. >>>>>>>>>>>>>> >>>>>>>>>>>>>> What can be possible reason ? Any suggestions would be really >>>>>>>>>>>>>> helpful as the cluster size seems to be growing each day even >>>>>>>>>>>>>> though >>>>>>>>>>>>>> snapshots are deleted. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Karun >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> ceph-users mailing list >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> >>> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > -- > Jason >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
