Re: [ceph-users] Snapshot trimming

Karun Josy Tue, 30 Jan 2018 07:38:22 -0800

Hi Jason,

>> Was the base RBD pool used only for data-pool associated images
Yes, it is only used for storing metadata of ecpool.


We use 2 pools for erasure coding

ecpool - erasure coded datapool
vm -  replicated pool to store  metadata

Karun Josy

On Tue, Jan 30, 2018 at 8:00 PM, Jason Dillaman <[email protected]> wrote:

> Unfortunately, any snapshots created prior to 12.2.2 against a separate
> data pool were incorrectly associated to the base image pool instead of the
> data pool. Was the base RBD pool used only for data-pool associated images
> (i.e. all the snapshots that exists within the pool can be safely deleted)?
>
> On Mon, Jan 29, 2018 at 11:50 AM, Karun Josy <[email protected]> wrote:
>
>>
>> The problem we are experiencing is described here:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1497332
>>
>> However, we are running 12.2.2.
>>
>> Across our 6 ceph clusters, this one with the problem was first version
>> 12.2.0, then upgraded to .1 and then to .2.
>>
>> The other 5 ceph installations started as version 12.2.1 and then updated
>> to .2.
>>
>> Karun Josy
>>
>> On Mon, Jan 29, 2018 at 7:01 PM, Karun Josy <[email protected]> wrote:
>>
>>> Thank you for your response.
>>>
>>> We don't think there is an issue with the cluster being behind snap
>>> trimming. We just don't think snaptrim is occurring at all.
>>>
>>> We have 6 individual ceph clusters. When we delete old snapshots for
>>> clients, we can see space being made available. In this particular one
>>> however, with 300 virtual machines, 28TBs of data (this is our largest
>>> ceph), I can delete hundreds of snapshots, and not a single gigabyte
>>> becomes available after doing that.
>>>
>>> In our other 5, smaller Ceph clusters, we can see hundreds of gigabytes
>>> becoming available again after doing massive deletions of snapshots.
>>>
>>> The Luminous gui also never shows "snaptrimming" occurring in the EC
>>> pool.  While the other 5 Luminous clusters, their GUI will show
>>> snaptrimming occurring for the EC pool. Within minutes we can see the
>>> additional space becoming available.
>>>
>>> This isn't an issue of the trimming queue behind schedule. The system
>>> shows there is no trimming scheduled in the queue, ever.
>>>
>>> However, when using ceph du on particular virtual machines, we can see
>>> that snapshots we delete are indeed no longer listed in ceph du's output.
>>>
>>> So, they seem to be deleting. But the space is not being reclaimed.
>>>
>>> All clusters are same hardware. Some have more disks and servers than
>>> others. The only major difference is that this particular Ceph with this
>>> problem, it had the noscrub and nodeep-scrub flags set for many weeks.
>>>
>>>
>>> Karun Josy
>>>
>>> On Mon, Jan 29, 2018 at 6:27 PM, David Turner <[email protected]>
>>> wrote:
>>>
>>>> I don't know why you keep asking the same question about snap trimming.
>>>> You haven't shown any evidence that your cluster is behind on that. Have
>>>> you looked into fstrim inside of your VMs?
>>>>
>>>> On Mon, Jan 29, 2018, 4:30 AM Karun Josy <[email protected]> wrote:
>>>>
>>>>> fast-diff map is not enabled for RBD images.
>>>>> Can it be a reason for Trimming not happening ?
>>>>>
>>>>> Karun Josy
>>>>>
>>>>> On Sat, Jan 27, 2018 at 10:19 PM, Karun Josy <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> Thank you for your reply! I really appreciate it.
>>>>>>
>>>>>> The images are in pool id 55. It is an erasure coded pool.
>>>>>>
>>>>>> ---------------
>>>>>> $ echo $(( $(ceph pg  55.58 query | grep snap_trimq | cut -d[ -f2 |
>>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>>> 0
>>>>>> $ echo $(( $(ceph pg  55.a query | grep snap_trimq | cut -d[ -f2 |
>>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>>> 0
>>>>>> $ echo $(( $(ceph pg  55.65 query | grep snap_trimq | cut -d[ -f2 |
>>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>>> 0
>>>>>> --------------
>>>>>>
>>>>>> Current snap_trim_sleep value is default.
>>>>>> "osd_snap_trim_sleep": "0.000000". I assume it means there is no
>>>>>> delay. (Can't find any documentation related to it)
>>>>>> Will changing its value initiate snaptrimming, like
>>>>>> ceph tell osd.* injectargs '--osd_snap_trim_sleep 0.05'
>>>>>>
>>>>>> Also, we are using an rbd user with the below profile. It is used
>>>>>> while deleting snapshots
>>>>>> -------
>>>>>>         caps: [mon] profile rbd
>>>>>>         caps: [osd] profile rbd pool=ecpool, profile rbd pool=vm,
>>>>>> profile rbd-read-only pool=templates
>>>>>> -------
>>>>>>
>>>>>> Can it be a reason ?
>>>>>>
>>>>>> Also, can you let me know which all logs to check while deleting
>>>>>> snapshots to see if it is snaptrimming ?
>>>>>> I am sorry I feel like pestering you too much.
>>>>>> But in mailing lists, I can see you have dealt with similar issues
>>>>>> with Snapshots
>>>>>> So I think you can help me figure this mess out.
>>>>>>
>>>>>>
>>>>>> Karun Josy
>>>>>>
>>>>>> On Sat, Jan 27, 2018 at 7:15 PM, David Turner <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Prove* a positive
>>>>>>>
>>>>>>> On Sat, Jan 27, 2018, 8:45 AM David Turner <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Unless you have things in your snap_trimq, your problem isn't snap
>>>>>>>> trimming. That is currently how you can check snap trimming and you say
>>>>>>>> you're caught up.
>>>>>>>>
>>>>>>>> Are you certain that you are querying the correct pool for the
>>>>>>>> images you are snapshotting. You showed that you tested 4 different 
>>>>>>>> pools.
>>>>>>>> You should only need to check the pool with the images you are dealing 
>>>>>>>> with.
>>>>>>>>
>>>>>>>> You can inversely price a positive by changing your snap_trim
>>>>>>>> settings to not do any cleanup and see if the appropriate PGs have 
>>>>>>>> anything
>>>>>>>> in their q.
>>>>>>>>
>>>>>>>> On Sat, Jan 27, 2018, 12:06 AM Karun Josy <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Is scrubbing and deep scrubbing necessary for Snaptrim operation
>>>>>>>>> to happen ?
>>>>>>>>>
>>>>>>>>> Karun Josy
>>>>>>>>>
>>>>>>>>> On Fri, Jan 26, 2018 at 9:29 PM, Karun Josy <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thank you for your quick response!
>>>>>>>>>>
>>>>>>>>>> I used the command to fetch the snap_trimq from many pgs, however
>>>>>>>>>> it seems they don't have any in queue ?
>>>>>>>>>>
>>>>>>>>>> For eg :
>>>>>>>>>> ====================
>>>>>>>>>> $ echo $(( $(ceph pg  55.4a query | grep snap_trimq | cut -d[ -f2
>>>>>>>>>> | cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>>>>>>> 0
>>>>>>>>>> $ echo $(( $(ceph pg  55.5a query | grep snap_trimq | cut -d[ -f2
>>>>>>>>>> | cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>>>>>>> 0
>>>>>>>>>> $ echo $(( $(ceph pg  55.88 query | grep snap_trimq | cut -d[ -f2
>>>>>>>>>> | cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>>>>>>> 0
>>>>>>>>>> $ echo $(( $(ceph pg  55.55 query | grep snap_trimq | cut -d[ -f2
>>>>>>>>>> | cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>>>>>>> 0
>>>>>>>>>> $ echo $(( $(ceph pg  54.a query | grep snap_trimq | cut -d[ -f2
>>>>>>>>>> | cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>>>>>>> 0
>>>>>>>>>> $ echo $(( $(ceph pg  34.1d query | grep snap_trimq | cut -d[ -f2
>>>>>>>>>> | cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>>>>>>> 0
>>>>>>>>>> $ echo $(( $(ceph pg  1.3f query | grep snap_trimq | cut -d[ -f2
>>>>>>>>>> | cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>>>>>>> 0
>>>>>>>>>> =====================
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> While going through the PG query, I find that these PGs have no
>>>>>>>>>> value in purged_snaps section too.
>>>>>>>>>> For eg :
>>>>>>>>>> ceph pg  55.80 query
>>>>>>>>>> --
>>>>>>>>>> ---
>>>>>>>>>> ---
>>>>>>>>>>  {
>>>>>>>>>>             "peer": "83(3)",
>>>>>>>>>>             "pgid": "55.80s3",
>>>>>>>>>>             "last_update": "43360'15121927",
>>>>>>>>>>             "last_complete": "43345'15073146",
>>>>>>>>>>             "log_tail": "43335'15064480",
>>>>>>>>>>             "last_user_version": 15066124,
>>>>>>>>>>             "last_backfill": "MAX",
>>>>>>>>>>             "last_backfill_bitwise": 1,
>>>>>>>>>>             "purged_snaps": [],
>>>>>>>>>>             "history": {
>>>>>>>>>>                 "epoch_created": 5950,
>>>>>>>>>>                 "epoch_pool_created": 5950,
>>>>>>>>>>                 "last_epoch_started": 43339,
>>>>>>>>>>                 "last_interval_started": 43338,
>>>>>>>>>>                 "last_epoch_clean": 43340,
>>>>>>>>>>                 "last_interval_clean": 43338,
>>>>>>>>>>                 "last_epoch_split": 0,
>>>>>>>>>>                 "last_epoch_marked_full": 42032,
>>>>>>>>>>                 "same_up_since": 43338,
>>>>>>>>>>                 "same_interval_since": 43338,
>>>>>>>>>>                 "same_primary_since": 43276,
>>>>>>>>>>                 "last_scrub": "35299'13072533",
>>>>>>>>>>                 "last_scrub_stamp": "2018-01-18 14:01:19.557972",
>>>>>>>>>>                 "last_deep_scrub": "31372'12176860",
>>>>>>>>>>                 "last_deep_scrub_stamp": "2018-01-15
>>>>>>>>>> 12:21:17.025305",
>>>>>>>>>>                 "last_clean_scrub_stamp": "2018-01-18
>>>>>>>>>> 14:01:19.557972"
>>>>>>>>>>             },
>>>>>>>>>>
>>>>>>>>>> Not sure if it is related.
>>>>>>>>>>
>>>>>>>>>> The cluster is not open to any new clients. However we see a
>>>>>>>>>> steady growth of  space usage every day.
>>>>>>>>>> And worst case scenario, it might grow faster than we can add
>>>>>>>>>> more space, which will be dangerous.
>>>>>>>>>>
>>>>>>>>>> Any help is really appreciated.
>>>>>>>>>>
>>>>>>>>>> Karun Josy
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 26, 2018 at 8:23 PM, David Turner <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> "snap_trimq": "[]",
>>>>>>>>>>>
>>>>>>>>>>> That is exactly what you're looking for to see how many objects
>>>>>>>>>>> a PG still had that need to be cleaned up. I think something like 
>>>>>>>>>>> this
>>>>>>>>>>> should give you the number of objects in the snap_trimq for a PG.
>>>>>>>>>>>
>>>>>>>>>>> echo $(( $(ceph pg $pg query | grep snap_trimq | cut -d[ -f2 |
>>>>>>>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>>>>>>>>
>>>>>>>>>>> Note, I'm not at a computer and topping this from my phone so
>>>>>>>>>>> it's not pretty and I know of a few ways to do that better, but 
>>>>>>>>>>> that should
>>>>>>>>>>> work all the same.
>>>>>>>>>>>
>>>>>>>>>>> For your needs a visual inspection of several PGs should be
>>>>>>>>>>> sufficient to see if there is anything in the snap_trimq to begin 
>>>>>>>>>>> with.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jan 26, 2018, 9:18 AM Karun Josy <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>  Hi David,
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you for the response. To be honest, I am afraid it is
>>>>>>>>>>>> going to be a issue in our cluster.
>>>>>>>>>>>> It seems snaptrim has not been going on for sometime now ,
>>>>>>>>>>>> maybe because we were expanding the cluster adding nodes for the 
>>>>>>>>>>>> past few
>>>>>>>>>>>> weeks.
>>>>>>>>>>>>
>>>>>>>>>>>> I would be really glad if you can guide me how to overcome this.
>>>>>>>>>>>> Cluster has about 30TB data and 11 million objects. With about
>>>>>>>>>>>> 100 disks spread across 16 nodes. Version is 12.2.2
>>>>>>>>>>>> Searching through the mailing lists I can see many cases where
>>>>>>>>>>>> the performance were affected while snaptrimming.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you help me figure out these :
>>>>>>>>>>>>
>>>>>>>>>>>> - How to find snaptrim queue of a PG.
>>>>>>>>>>>> - Can snaptrim be started just on 1 PG
>>>>>>>>>>>> - How can I make sure cluster IO performance is not affected ?
>>>>>>>>>>>> I read about osd_snap_trim_sleep , how can it be changed ?
>>>>>>>>>>>> Is this the command : ceph tell osd.* injectargs
>>>>>>>>>>>> '--osd_snap_trim_sleep 0.005'
>>>>>>>>>>>>
>>>>>>>>>>>> If yes what is the recommended value that we can use ?
>>>>>>>>>>>>
>>>>>>>>>>>> Also, what all parameters should we be concerned about? I would
>>>>>>>>>>>> really appreciate any suggestions.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Below is a brief extract of a PG queried
>>>>>>>>>>>> ----------------------------
>>>>>>>>>>>> ceph pg  55.77 query
>>>>>>>>>>>> {
>>>>>>>>>>>>     "state": "active+clean",
>>>>>>>>>>>>     "snap_trimq": "[]",
>>>>>>>>>>>> ---
>>>>>>>>>>>> ----
>>>>>>>>>>>>
>>>>>>>>>>>> "pgid": "55.77s7",
>>>>>>>>>>>>             "last_update": "43353'17222404",
>>>>>>>>>>>>             "last_complete": "42773'16814984",
>>>>>>>>>>>>             "log_tail": "42763'16812644",
>>>>>>>>>>>>             "last_user_version": 16814144,
>>>>>>>>>>>>             "last_backfill": "MAX",
>>>>>>>>>>>>             "last_backfill_bitwise": 1,
>>>>>>>>>>>>             "purged_snaps": [],
>>>>>>>>>>>>             "history": {
>>>>>>>>>>>>                 "epoch_created": 5950,
>>>>>>>>>>>> ---
>>>>>>>>>>>> ---
>>>>>>>>>>>> ---
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Karun Josy
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 26, 2018 at 6:36 PM, David Turner <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> You may find the information in this ML thread useful.
>>>>>>>>>>>>> https://www.spinics.net/lists/ceph-users/msg41279.html
>>>>>>>>>>>>>
>>>>>>>>>>>>> It talks about a couple ways to track your snaptrim queue.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jan 26, 2018 at 2:09 AM Karun Josy <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We have set no scrub , no deep scrub flag on a ceph cluster.
>>>>>>>>>>>>>> When we are deleting snapshots we are not seeing any change
>>>>>>>>>>>>>> in usage space.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I understand that Ceph OSDs delete data asynchronously, so
>>>>>>>>>>>>>> deleting a snapshot doesn’t free up the disk space immediately. 
>>>>>>>>>>>>>> But we are
>>>>>>>>>>>>>> not seeing any change for sometime.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What can be possible reason ? Any suggestions would be really
>>>>>>>>>>>>>> helpful as the cluster size seems to be growing each day even 
>>>>>>>>>>>>>> though
>>>>>>>>>>>>>> snapshots are deleted.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Karun
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>
>>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> Jason
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Snapshot trimming

Reply via email to