Re: [ceph-users] rbd resize (shrink) taking forever and a day

Chen, Xiaoxi Tue, 06 Jan 2015 04:54:08 -0800

When you shrinking the RBD, most of the time was spent on 
librbd/internal.cc::trim_image(), in this function, client will iterator all 
unnecessary objects(no matter whether it exists) and delete them.

So in this case,  when Edwin shrinking his RBD from 650PB to 650GB,   there 
are[ (650PB * 1024GB/PB -650GB) * 1024MB/GB ] / 4MB/Object = 170,227,200 
Objects need to be deleted.That will definitely take a long time since rbd 
client need to send a delete request to OSD, OSD need to find out the object 
context and delete(or doesn’t exist at all). The time needed to trim an image 
is ratio to the size needed to trim.

make another image of the correct size and copy your VM's file system to the 
new image, then delete the old one will  NOT help in general, just because 
delete the old volume will take exactly the same time as shrinking , they both 
need to call trim_image().

The solution in my mind may be we can provide a “—skip-triming” flag to skip 
the trimming. When the administrator absolutely sure there is no written have 
taken place in the shrinking area(that means there is no object created in 
these area), they can use this flag to skip the time consuming trimming.

How do you think?

From: Jake Young [mailto:[email protected]]
Sent: Monday, January 5, 2015 9:45 PM
To: Chen, Xiaoxi
Cc: Edwin Peer; [email protected]
Subject: Re: [ceph-users] rbd resize (shrink) taking forever and a day

On Sunday, January 4, 2015, Chen, Xiaoxi 
<[email protected]<mailto:[email protected]>> wrote:
You could use rbd info <volume_name>  to see the block_name_prefix, the object 
name consist like <block_name_prefix>.<sequence_number>,  so for example, 
rb.0.ff53.3d1b58ba.00000000e6ad should be the <e6ad>th object  of the volume 
with block_name_prefix rb.0.ff53.3d1b58ba.

     $ rbd info huge
        rbd image 'huge':
         size 1024 TB in 268435456 objects
         order 22 (4096 kB objects)
         block_name_prefix: rb.0.8a14.2ae8944a
         format: 1

-----Original Message-----
From: ceph-users [mailto:[email protected]<javascript:;>] On 
Behalf Of Edwin Peer
Sent: Monday, January 5, 2015 3:55 AM
To: [email protected]<javascript:;>
Subject: Re: [ceph-users] rbd resize (shrink) taking forever and a day

Also, which rbd objects are of interest?

<snip>
ganymede ~ # rados -p client-disk-img0 ls | wc -l
1672636
</snip>

And, all of them have cryptic names like:

rb.0.ff53.3d1b58ba.00000000e6ad
rb.0.6d386.1d545c4d.000000011461
rb.0.50703.3804823e.000000001c28
rb.0.1073e.3d1b58ba.00000000b715
rb.0.1d76.2ae8944a.00000000022d

which seem to bear no resemblance to the actual image names that the rbd 
command line tools understands?

Regards,
Edwin Peer

On 01/04/2015 08:48 PM, Jake Young wrote:
>
>
> On Sunday, January 4, 2015, Dyweni - Ceph-Users
> <[email protected]<javascript:;> 
> <mailto:[email protected]<javascript:;>>> wrote:
>
>     Hi,
>
>     If its the only think in your pool, you could try deleting the
>     pool instead.
>
>     I found that to be faster in my testing; I had created 500TB when
>     I meant to create 500GB.
>
>     Note for the Devs: I would be nice if rbd create/resize would
>     accept sizes with units (i.e. MB GB TB PB, etc).
>
>
>
>
>     On 2015-01-04 08:45, Edwin Peer wrote:
>
>         Hi there,
>
>         I did something stupid while growing an rbd image. I accidentally
>         mistook the units of the resize command for bytes instead of
>         megabytes
>         and grew an rbd image to 650PB instead of 650GB. This all happened
>         instantaneously enough, but trying to rectify the mistake is
>         not going
>         nearly as well.
>
>         <snip>
>         ganymede ~ # rbd resize --size 665600 --allow-shrink
>         client-disk-img0/vol-x318644f-0
>         Resizing image: 1% complete...
>         </snip>
>
>         It took a couple days before it started showing 1% complete
>         and has
>         been stuck on 1% for a couple more. At this rate, I should be
>         able to
>         shrink the image back to the intended size in about 2016.
>
>         Any ideas?
>
>         Regards,
>         Edwin Peer
>         _______________________________________________
>         ceph-users mailing list
>         [email protected]<javascript:;>
>         http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>     _______________________________________________
>     ceph-users mailing list
>     [email protected]<javascript:;>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> You can just delete the rbd header. See Sebastien's excellent blog:
>
> http://www.sebastien-han.fr/blog/2013/12/12/rbd-image-bigger-than-your
> -ceph-cluster/
>
> Jake
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]<javascript:;>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]<javascript:;>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]<javascript:;>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Sorry, I misunderstood.

The simplest approach to me is to make another image of the correct size and 
copy your VM's file system to the new image, then delete the old one.

The safest thing to do would be to mount the new file system from the VM and do 
all the formatting / copying from there (the same way you'd move a physical 
server's root disk to a new physical disk)

I would not attempt to hack the rbd header. You open yourself up to some 
unforeseen problems.

Unless one of the ceph developers can comment there is a safe way to shrink an 
image, assuming we know that the file system has not grown since growing the 
disk.

Jake

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd resize (shrink) taking forever and a day

Reply via email to