Hi,
I don't know the general calculation, but last week we split a pool with 20
million tiny objects from 512 to 1024 pgs, on a cluster with 80 OSDs. IIRC
around 7 million objects needed to move, and it took around 13 hours to
finish. The bottleneck in our case was objects per second (limited to
around 1000/s), not network throughput (which never exceeded ~50MB/s).

It wasn't completely transparent... the time to write a 4kB object
increased from 5ms to around 30ms during this splitting process.

I would guess that if you split from 1k to 8k pgs, around 80% of your data
will move. Basically, 7 out of 8 objects will be moved to a new primary PG,
but any objects that end up with 2nd or 3rd copies on the first 1k PGs
should not need to be moved.

I'd also be interested to hear of similar splitting experiences. We've been
planning a similar intervention on our larger cluster to move from 4k PGs
to 16k. I have been considering making the change gradually (10-100 PGs at
a time) instead of all at once. This approach would certainly lower the
performance impact, but would take much much longer to complete. I wrote a
short script to perform this gentle splitting here:
https://github.com/cernceph/ceph-scripts/blob/master/tools/split/ceph-gentle-split

Be sure to understand what it's doing before trying it.

Cheers,
Dan
On 1 Feb 2015 18:21, "Xu (Simon) Chen" <xche...@gmail.com> wrote:

> Hi folks,
>
> I was running a ceph cluster with 33 OSDs. More recently, 33x6 new OSDs
> hosted on 33 new servers were added, and I have finished balancing the data
> and then marked the 33 old OSDs out.
>
> As I have 6x as many OSDs, I am thinking of increasing pg_num of my
> largest pool from 1k to at least 8k. What worries me is that this cluster
> has around 10M objects and is supporting many production VMs with RBD.
>
> I am wondering if there is a good way to estimate the amount of data that
> will be shuffled after I increase the PG_NUM. I want to make sure this can
> be done within a reasonable amount of time, such that I can declare a
> proper maintenance window (either over night, or throughout a weekend..)
>
> Thanks!
> -Simon
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to