subject:"\[ceph\-users\] estimate the impact of changing pg

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Dan van der Ster

Hi, On 1 Feb 2015 22:04, "Xu (Simon) Chen" wrote: > > Dan, > > I alway have noout set, so that single OSD failures won't trigger any recovery immediately. When the OSD (or sometimes multiple OSDs on the same server) comes back, I do see slow requests during backfilling, but probably not thousands.

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Xu (Simon) Chen

Dan, I alway have noout set, so that single OSD failures won't trigger any recovery immediately. When the OSD (or sometimes multiple OSDs on the same server) comes back, I do see slow requests during backfilling, but probably not thousands. When I added a brand new OSD into the cluster, for some r

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Dan van der Ster

Hi, When do you see thousands of slow requests during recovery... Does that happen even with single OSD failures? You should be able to recover disks without slow requests. I always run with recovery op priority at the minimum 1. Tweaking the number of max backfills did not change much during that

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Udo Lembke

Hi Xu, On 01.02.2015 21:39, Xu (Simon) Chen wrote: > RBD doesn't work extremely well when ceph is recovering - it is common > to see hundreds or a few thousands of blocked requests (>30s to > finish). This translates high IO wait inside of VMs, and many > applications don't deal with this well. th

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Xu (Simon) Chen

In my case, each object is 8MB (glance default for storing images on rbd backend.) RBD doesn't work extremely well when ceph is recovering - it is common to see hundreds or a few thousands of blocked requests (>30s to finish). This translates high IO wait inside of VMs, and many applications don't

Re: [ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Dan van der Ster

Hi, I don't know the general calculation, but last week we split a pool with 20 million tiny objects from 512 to 1024 pgs, on a cluster with 80 OSDs. IIRC around 7 million objects needed to move, and it took around 13 hours to finish. The bottleneck in our case was objects per second (limited to ar

[ceph-users] estimate the impact of changing pg_num

2015-02-01 Thread Xu (Simon) Chen

Hi folks, I was running a ceph cluster with 33 OSDs. More recently, 33x6 new OSDs hosted on 33 new servers were added, and I have finished balancing the data and then marked the 33 old OSDs out. As I have 6x as many OSDs, I am thinking of increasing pg_num of my largest pool from 1k to at least 8

Re: [ceph-users] estimate the impact of changing pg_num

Re: [ceph-users] estimate the impact of changing pg_num

Re: [ceph-users] estimate the impact of changing pg_num

Re: [ceph-users] estimate the impact of changing pg_num

Re: [ceph-users] estimate the impact of changing pg_num

Re: [ceph-users] estimate the impact of changing pg_num

[ceph-users] estimate the impact of changing pg_num

7 matches

Site Navigation

Mail list logo

Footer information