Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Dan van der Ster
Hi, On 1 Feb 2015 22:04, "Xu (Simon) Chen" wrote: > > Dan, > > I alway have noout set, so that single OSD failures won't trigger any recovery immediately. When the OSD (or sometimes multiple OSDs on the same server) comes back, I do see slow requests during backfilling, but probably not thousands.

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Xu (Simon) Chen
Dan, I alway have noout set, so that single OSD failures won't trigger any recovery immediately. When the OSD (or sometimes multiple OSDs on the same server) comes back, I do see slow requests during backfilling, but probably not thousands. When I added a brand new OSD into the cluster, for some r

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Dan van der Ster
Hi, When do you see thousands of slow requests during recovery... Does that happen even with single OSD failures? You should be able to recover disks without slow requests. I always run with recovery op priority at the minimum 1. Tweaking the number of max backfills did not change much during that

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Udo Lembke
Hi Xu, On 01.02.2015 21:39, Xu (Simon) Chen wrote: > RBD doesn't work extremely well when ceph is recovering - it is common > to see hundreds or a few thousands of blocked requests (>30s to > finish). This translates high IO wait inside of VMs, and many > applications don't deal with this well. th

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Xu (Simon) Chen
In my case, each object is 8MB (glance default for storing images on rbd backend.) RBD doesn't work extremely well when ceph is recovering - it is common to see hundreds or a few thousands of blocked requests (>30s to finish). This translates high IO wait inside of VMs, and many applications don't

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Dan van der Ster
Hi, I don't know the general calculation, but last week we split a pool with 20 million tiny objects from 512 to 1024 pgs, on a cluster with 80 OSDs. IIRC around 7 million objects needed to move, and it took around 13 hours to finish. The bottleneck in our case was objects per second (limited to ar

[ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Xu (Simon) Chen
Hi folks, I was running a ceph cluster with 33 OSDs. More recently, 33x6 new OSDs hosted on 33 new servers were added, and I have finished balancing the data and then marked the 33 old OSDs out. As I have 6x as many OSDs, I am thinking of increasing pg_num of my largest pool from 1k to at least 8