[ceph-users] Re: Can 16 server grade ssd's be slower then 60 hdds? (no extra journals)

Reed Dier Wed, 02 Sep 2020 10:35:20 -0700

Just for the sake of curiosity, if you do a show all on /cX/vX, what is shown 
for the VD properties?
> VD0 Properties :
> ==============
> Strip Size = 256 KB
> Number of Blocks = 1953374208
> VD has Emulated PD = No
> Span Depth = 1
> Number of Drives Per Span = 1
> Write Cache(initial setting) = WriteBack
> Disk Cache Policy = Disabled
> Encryption = None
> Data Protection = Disabled
> Active Operations = None
> Exposed to OS = Yes
> Creation Date = 17-06-2016
> Creation Time = 02:49:02 PM
> Emulation type = default
> Cachebypass size = Cachebypass-64k
> Cachebypass Mode = Cachebypass Intelligent
> Is LD Ready for OS Requests = Yes
> SCSI NAA Id = 600304801bb4c0001ef6ca5ea0fcb283


I'm wondering if the pdcache value must be set at vd creation, as it is a 
creation option as well.
If that's the case, maybe consider blowing away one of the SSD vd's and 
recreating the vd and OSD, and see if you can measure a difference on that disk 
specifically in testing.

It might also be helpful to document some of these values from /cX show all

> Version :
> =======
> Firmware Package Build = 24.7.0-0026
> Firmware Version = 4.270.00-3972
> Bios Version = 6.22.03.0_4.16.08.00_0x060B0200
> Ctrl-R Version = 5.08-0006
> Preboot CLI Version = 01.07-05:#%0000
> NVDATA Version = 3.1411.00-0009
> Boot Block Version = 3.06.00.00-0001
> Driver Name = megaraid_sas
> Driver Version = 07.703.05.00-rc1
> 
> Supported Adapter Operations :
> ============================
> Support Shield State = Yes
> Block SSD Write Disk Cache Change = Yes
> Support Suspend Resume BG ops = Yes
> Support Emergency Spares = Yes
> Support Set Link Speed = Yes
> Support Boot Time PFK Change = No
> Support JBOD = Yes
> 
> Supported VD Operations :
> =======================
> Read Policy = Yes
> Write Policy = Yes
> IO Policy = Yes
> Access Policy = Yes
> Disk Cache Policy = Yes
> Reconstruction = Yes
> Deny Locate = No
> Deny CC = No
> Allow Ctrl Encryption = No
> Enable LDBBM = No
> Support FastPath = Yes
> Performance Metrics = Yes
> Power Savings = No
> Support Powersave Max With Cache = No
> Support Breakmirror = Yes
> Support SSC WriteBack = No
> Support SSC Association = No
> Support VD Hide = Yes
> Support VD Cachebypass = Yes
> Support VD discardCacheDuringLDDelete = Yes
> 
> 
> Advanced Software Option :
> ========================
> 
> ----------------------------------------
> Adv S/W Opt        Time Remaining  Mode
> ----------------------------------------
> MegaRAID FastPath  Unlimited       -
> MegaRAID RAID6     Unlimited       -
> MegaRAID RAID5     Unlimited       -
> ----------------------------------------



Namely, on my 3108 controller, Block SSD Write Disk Cache Change = Yes, stands 
out to me.
My controller has SAS HDD's behind it, though so I just may not be running into 
the same issue, that may pertain to me.
Also wondering if FastPath is enabled as well. I know on some of the older 
controllers, it was a paid feature enable, but they then opened it up for free, 
though you may need a software key to enable it (for free).

Just looking to widen the net and hope we catch something.

Reed

> On Sep 2, 2020, at 7:38 AM, VELARTIS Philipp Dürhammer 
> <p.duerham...@velartis.at> wrote:
> 
>>> I assume you are referencing this parameter?
> 
>>> storcli /c0/v0 set ssdcaching=<on|off>
> 
>>> If so, this is for CacheCade, which is LSI's cache tiering solution, which 
>>> should both be off and not in use for ceph.
> 
> No storcli /cx/vx set pdcache=off is denied because of the lsi setting "Block 
> SSD Write Disk Cache Change = Yes"
> I cannot find any firmware to upload or way to change this
> 
> Do you think that disabling the write cache also on the ssd helps a lot (ceph 
> is not aware of this because 'smartctl -g wcache /dev/sdX shows cache 
> disabled - because the cache on the lsi is disabled allready)
> The only way would be to buy some hba cards and add it to the server. But 
> that’s a lot of work - not knowing that this will improve the speed a lot.
> 
> I am using rbd with hyperconvergenced nodes (4 at the moment) pools are 2 and 
> 3 times replicated. actually the performance for windows and linux vms with 
> the hdd osd pool was ok. But with the time getting a little bit more slow. I 
> just want to get ready for the future. and we plan to put some bigger 
> database servers on the cluster (they are on local storage at the moment) and 
> therefore I want to increase the random small iops of the cluster a lot
> 
> -----Ursprüngliche Nachricht-----
> Von: Reed Dier <reed.d...@focusvq.com> 
> Gesendet: Dienstag, 01. September 2020 23:44
> An: VELARTIS Philipp Dürhammer <p.duerham...@velartis.at>
> Cc: ceph-users@ceph.io
> Betreff: Re: [ceph-users] Can 16 server grade ssd's be slower then 60 hdds? 
> (no extra journals)
> 
>> there is an option set in the controller "Block SSD Write Disk Cache Change 
>> = Yes" which does not permit to deactivate the ssd cache. I could not find 
>> any solution in google for this controller (LSI MegaRAID SAS 9271-8i) to 
>> change this setting.
> 
> 
> I assume you are referencing this parameter?
> 
> storcli /c0/v0 set ssdcaching=<on|off>
> 
> If so, this is for CacheCade, which is LSI's cache tiering solution, which 
> should both be off and not in use for ceph.
> 
> Single thread and single iodepth benchmarks will tend to be underwhelming.
> Ceph shines with aggregate performance from lots of clients.
> And in an odd twist of fate, I typically see better performance on RBD for 
> random benchmarks rather than sequential benchmarks, as it distributes the 
> load across more OSD's.
> 
> Might also help others offer some pointers for tuning if you describe the 
> pool/application a bit more.
> 
> Ie RBD vs cephfs vs RGW, 3x replicated vs EC, etc.
> 
> At least things are trending in a positive direction.
> 
> Reed
> 
>> On Sep 1, 2020, at 4:21 PM, VELARTIS Philipp Dürhammer 
>> <p.duerham...@velartis.at> wrote:
>> 
>> Thank you. I was working in this direction. The situation is a lot better. 
>> But I think I can get still far better.
>> 
>> I could set the controller to writethrough, direct and no read ahead for the 
>> ssds.
>> But I cannot disable the pdcache ☹ there is an option set in the controller 
>> "Block SSD Write Disk Cache Change = Yes" which does not permit to 
>> deactivate the ssd cache. I could not find any solution in google for this 
>> controller (LSI MegaRAID SAS 9271-8i) to change this setting.
>> 
>> I don’t know how much performance gain it will be to deactivate the ssd 
>> cache. At least the micron 5200max has capacitor so I hope it is safe for 
>> data loss in case if power failure. I wrote a request to lsi / Broadcom if 
>> they know how I can change this setting. This is really annyoing.
>> 
>> I will check the cpu power settings. I rode also somewhere it can improve 
>> iops a lot. (if its bad set)
>> 
>> At the moment I get 600iops 4k random write 1 thread and 1 iodepth. I get 
>> 40K - 4k random iops for some instances with 32iodepth. Its not the world 
>> but a lot better then before. Read around 100k iops. For 16 ssd's and 2 x 
>> dual 10G nic.
>> 
>> I was reading that good tunings and hardware config can get more then 2000 
>> iops on single thread out of the ssds. I know thet ceph does not shine with 
>> single thread. But 600 iops is not very much...
>> 
>> philipp
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: Reed Dier <reed.d...@focusvq.com> 
>> Gesendet: Dienstag, 01. September 2020 22:37
>> An: VELARTIS Philipp Dürhammer <p.duerham...@velartis.at>
>> Cc: ceph-users@ceph.io
>> Betreff: Re: [ceph-users] Can 16 server grade ssd's be slower then 60 hdds? 
>> (no extra journals)
>> 
>> If using storcli/perccli for manipulating the LSI controller, you can 
>> disable the on-disk write cache with:
>> storcli /cx/vx set pdcache=off
>> 
>> You can also ensure that you turn off write caching at the controller level 
>> with 
>> storcli /cx/vx set iopolicy=direct
>> storcli /cx/vx set wrcache=wt
>> 
>> You can also tweak the readahead value for the vd if you want, though with 
>> an ssd, I don't think it will be much of an issue.
>> storcli /cx/vx set rdcache=nora
>> 
>> I'm sure the megacli alternatives are available with some quick searches.
>> 
>> May also want to check your c-states and p-states to make sure there isn't 
>> any aggressive power saving features getting in the way.
>> 
>> Reed
>> 
>>> On Aug 31, 2020, at 7:44 AM, VELARTIS Philipp Dürhammer 
>>> <p.duerham...@velartis.at> wrote:
>>> 
>>> We have older LSi Raid controller with no HBA/JBOD option. So we expose the 
>>> single disks as raid0 devices. Ceph should not be aware of cache status?
>>> But digging deeper in to it it seems that 1 out of 4 serves is performing a 
>>> lot better and has super low commit/applay rates while the other have a lot 
>>> mor (20+) on heavy writes. This just applys fore the ssd. For the hdds I 
>>> cant see a difference...
>>> 
>>> -----Ursprüngliche Nachricht-----
>>> Von: Frank Schilder <fr...@dtu.dk> 
>>> Gesendet: Montag, 31. August 2020 13:19
>>> An: VELARTIS Philipp Dürhammer <p.duerham...@velartis.at>; 
>>> 'ceph-users@ceph.io' <ceph-users@ceph.io>
>>> Betreff: Re: Can 16 server grade ssd's be slower then 60 hdds? (no extra 
>>> journals)
>>> 
>>> Yes, they can - if volatile write cache is not disabled. There are many 
>>> threads on this, also recent. Search for "disable write cache" and/or 
>>> "disable volatile write cache".
>>> 
>>> You will also find different methods of doing this automatically.
>>> 
>>> Best regards,
>>> =================
>>> Frank Schilder
>>> AIT Risø Campus
>>> Bygning 109, rum S14
>>> 
>>> ________________________________________
>>> From: VELARTIS Philipp Dürhammer <p.duerham...@velartis.at>
>>> Sent: 31 August 2020 13:02:45
>>> To: 'ceph-users@ceph.io'
>>> Subject: [ceph-users] Can 16 server grade ssd's be slower then 60 hdds? (no 
>>> extra journals)
>>> 
>>> I have a productive 60 osd's cluster. No extra Journals. Its performing 
>>> okay. Now I added an extra ssd Pool with 16 Micron 5100 MAX. And the 
>>> performance is little slower or equal to the 60 hdd pool. 4K random as also 
>>> sequential reads. All on dedicated 2 times 10G Network. HDDS are still on 
>>> filestore. SSD on bluestore. Ceph Luminous.
>>> What should be possible 16 ssd's vs. 60 hhd's no extra journals?
>>> 
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email 
>>> to ceph-users-le...@ceph.io
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
>

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Can 16 server grade ssd's be slower then 60 hdds? (no extra journals)

Reply via email to