Robert,

Interesting results on the effect of # of PG/PGPs. My cluster struggles a bit under the strain of heavy random small-sized writes.

The IOPS you mention seem high to me given 30 drives and 3x replication unless they were pure reads or on high-rpm drives. Instead of assuming, I want to pose a few questions:

- How are you testing? rados bench, rbd bench, rbd bench with writeback cache, etc?

- Were the 2000-2500 random 4k IOPS more reads than writes? If you test 100% 4k random reads, what do you get? If you test 100% 4k random writes, what do you get?

- What drives do you have? Any RAID involved under your OSDs?

Thanks,
Mike Dawson


On 12/3/2013 1:31 AM, Robert van Leeuwen wrote:

On 2 dec. 2013, at 18:26, "Brian Andrus" <brian.and...@inktank.com> wrote:

  Setting your pg_num and pgp_num to say... 1024 would A) increase data 
granularity, B) likely lend no noticeable increase to resource consumption, and 
C) allow some room for future OSDs two be added while still within range of 
acceptable pg numbers. You could probably safely double even that number if you 
plan on expanding at a rapid rate and want to avoid splitting PGs every time a 
node is added.

In general, you can conservatively err on the larger side when it comes to 
pg/p_num. Any excess resource utilization will be negligible (up to a certain 
point). If you have a comfortable amount of available RAM, you could experiment 
with increasing the multiplier in the equation you are using and see how it 
affects your final number.

The pg_num and pgp_num parameters can safely be changed before or after your 
new nodes are integrated.

I would be a bit conservative with the PGs / PGPs.
I've experimented with the PG number a bit and noticed the following random IO 
performance drop.
( this could be something to our specific setup but since the PG is easily 
increased and impossible to decrease I would be conservative)

  The setup:
3 OSD nodes with 128GB ram, 2 * 6 core CPU (12 with ht).
Nodes have 10 OSDs running on 1 tb disks and 2 SSDs for Journals.

We use a replica count of 3 so optimum according to formula is about 1000
With 1000 PGs I got about 2000-2500 random 4k IOPS.

Because the nodes are fast enough and I expect the cluster to be expanded with 
3 more nodes I set the PGs to 2000.
Performance dropped to about 1200-1400 IOPS.

I noticed that the spinning disks where no longer maxing out on 100% usage.
Memory and CPU did not seem to be a problem.
Since had the option to recreate the pool and I was not using the recommended 
settings I did not really dive into the issue.
I will not stray to far from the recommended settings in the future though :)

Cheers,
Robert van Leeuwen
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to