Re: [ceph-users] Adding new OSDs, need to increase PGs?

Robert van Leeuwen Tue, 03 Dec 2013 08:19:47 -0800

Not specifically set, according to the docs the default is off...

I am using the async qemu cuttlefish rpm. 
Maybe it does cache something but I think not:
Specifically setting writeback on in the client config yielded different 
results:
In our DEV environment we had issues with the virtual becoming unreachable 
under heavy IO load so I have not enabled it on the live machines..


I am currently at home but I could run a few tests tomorrow at work.
Is there an way to do small random IO write tests with the rados tools?
I just gave it a quick glance and it looked like it just did large sequential 
writes.

Cheers,
Robert van Leeuwen




Sent from my iPad

> On 3 dec. 2013, at 17:02, "Mike Dawson" <mike.daw...@cloudapt.com> wrote:
> 
> Robert,
> 
> Do you have rbd writeback cache enabled on these volumes? That could 
> certainly explain the higher than expected write performance. Any chance you 
> could re-test with rbd writeback on vs. off?
> 
> Thanks,
> Mike Dawson
> 
>> On 12/3/2013 10:37 AM, Robert van Leeuwen wrote:
>> Hi Mike,
>> 
>> I am using filebench within a kvm virtual. (Like an actual workload we will 
>> have)
>> Using 100% synchronous 4k writes with a 50GB file on a 100GB volume with 32 
>> writer threads.
>> Also tried from multiple KVM machines from multiple hosts.
>> Aggregated performance keeps at 2k+ IOPS
>> 
>> The disks are 7200RPM 2.5 inch drives, no RAID whatsoever.
>> I agree the amount of IOPS seem high.
>> Maybe the journal on SSD (2 x Intel 3500) helps a bit in this regard but the 
>> SSD's where not maxed out yet.
>> The writes seem to be limited by the spinning disks:
>> As soon as the benchmark starts the are used for 100% percent.
>> Also the usage dropped to 0% pretty much immediately after the benchmark so 
>> it looks like it's not lagging behind the journal.
>> 
>> Did not really test reads yet since we have so much read cache (128 GB per 
>> node) I assume we will mostly be write limited.
>> 
>> Cheers,
>> Robert van Leeuwen
>> 
>> 
>> 
>> Sent from my iPad
>> 
>>> On 3 dec. 2013, at 16:15, "Mike Dawson" <mike.daw...@cloudapt.com> wrote:
>>> 
>>> Robert,
>>> 
>>> Interesting results on the effect of # of PG/PGPs. My cluster struggles a 
>>> bit under the strain of heavy random small-sized writes.
>>> 
>>> The IOPS you mention seem high to me given 30 drives and 3x replication 
>>> unless they were pure reads or on high-rpm drives. Instead of assuming, I 
>>> want to pose a few questions:
>>> 
>>> - How are you testing? rados bench, rbd bench, rbd bench with writeback 
>>> cache, etc?
>>> 
>>> - Were the 2000-2500 random 4k IOPS more reads than writes? If you test 
>>> 100% 4k random reads, what do you get? If you test 100% 4k random writes, 
>>> what do you get?
>>> 
>>> - What drives do you have? Any RAID involved under your OSDs?
>>> 
>>> Thanks,
>>> Mike Dawson
>>> 
>>> 
>>>> On 12/3/2013 1:31 AM, Robert van Leeuwen wrote:
>>>> 
>>>>>> On 2 dec. 2013, at 18:26, "Brian Andrus" <brian.and...@inktank.com> 
>>>>>> wrote:
>>>>> 
>>>>>  Setting your pg_num and pgp_num to say... 1024 would A) increase data 
>>>>> granularity, B) likely lend no noticeable increase to resource 
>>>>> consumption, and C) allow some room for future OSDs two be added while 
>>>>> still within range of acceptable pg numbers. You could probably safely 
>>>>> double even that number if you plan on expanding at a rapid rate and want 
>>>>> to avoid splitting PGs every time a node is added.
>>>>> 
>>>>> In general, you can conservatively err on the larger side when it comes 
>>>>> to pg/p_num. Any excess resource utilization will be negligible (up to a 
>>>>> certain point). If you have a comfortable amount of available RAM, you 
>>>>> could experiment with increasing the multiplier in the equation you are 
>>>>> using and see how it affects your final number.
>>>>> 
>>>>> The pg_num and pgp_num parameters can safely be changed before or after 
>>>>> your new nodes are integrated.
>>>> 
>>>> I would be a bit conservative with the PGs / PGPs.
>>>> I've experimented with the PG number a bit and noticed the following 
>>>> random IO performance drop.
>>>> ( this could be something to our specific setup but since the PG is easily 
>>>> increased and impossible to decrease I would be conservative)
>>>> 
>>>>  The setup:
>>>> 3 OSD nodes with 128GB ram, 2 * 6 core CPU (12 with ht).
>>>> Nodes have 10 OSDs running on 1 tb disks and 2 SSDs for Journals.
>>>> 
>>>> We use a replica count of 3 so optimum according to formula is about 1000
>>>> With 1000 PGs I got about 2000-2500 random 4k IOPS.
>>>> 
>>>> Because the nodes are fast enough and I expect the cluster to be expanded 
>>>> with 3 more nodes I set the PGs to 2000.
>>>> Performance dropped to about 1200-1400 IOPS.
>>>> 
>>>> I noticed that the spinning disks where no longer maxing out on 100% usage.
>>>> Memory and CPU did not seem to be a problem.
>>>> Since had the option to recreate the pool and I was not using the 
>>>> recommended settings I did not really dive into the issue.
>>>> I will not stray to far from the recommended settings in the future though 
>>>> :)
>>>> 
>>>> Cheers,
>>>> Robert van Leeuwen
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Adding new OSDs, need to increase PGs?

Reply via email to