Well, it sounds like the pdcache setting may not be possible for SSD's, which is the first I've ever heard of this.
I actually just checked another system that I forgot was behind a 3108 controller with SSD's (not ceph, so wasn't considering it). It looks like I ran into the same issue during configuration of that then, as that vd is set to "Default" and not "Disabled". I think the best case option is to try and configure it for "FastPath" which sadly has next to no documentation other than extolling its [purported] benefits. > The per-virtual disk configuration required for Fast Path is: > write-through cache policy > no read-ahead > direct (non-cache) I/O > It provides the commands to set these (for adapter zero, logical disk zero): > megacli –LDSetProp WT Direct NORA L0 a0 I believe the storcli equivalent(s) would be > storcli /cx/vx set wrcache=wt > storcli /cx/vx set rdcache=nora > storcli /cx/vx set iopolicy=direct At this point that feels like the best option given the current restraints. I don't know why they don't publish those settings more readily, but at least we have the Google machine for that. Hopefully that may eek out a bit more performance. Reed > On Sep 3, 2020, at 5:31 AM, VELARTIS Philipp Dürhammer > <p.duerham...@velartis.at> wrote: > > In theory it should be possible to do this (to change the Block SSD Write > Disk Cache Change = Yes setting) > Run MegaSCU -adpsettings -write -f mfc.ini -a0 > Edit the mfc.ini file, setting "blockSSDWriteCacheChange" to 0 instead of 1. > Run MegaSCU -adpsettings -read -f mfc.ini -a0 > With MEGACLI but I get en error. It is not working for me to save the config > file. No idea why… > > My Config: > > VD16 Properties : > =============== > Strip Size = 256 KB > Number of Blocks = 3749642240 > VD has Emulated PD = Yes > Span Depth = 1 > Number of Drives Per Span = 1 > Write Cache(initial setting) = WriteThrough > Disk Cache Policy = Enabled > Encryption = None > Data Protection = Disabled > Active Operations = None > Exposed to OS = Yes > Creation Date = 25-08-2020 > Creation Time = 12:05:41 PM > Emulation type = None > > Version : > ======= > Firmware Package Build = 23.28.0-0010 > Firmware Version = 3.400.05-3175 > Bios Version = 5.46.02.0_4.16.08.00_0x06060900 > NVDATA Version = 2.1403.03-0128 > Boot Block Version = 2.05.00.00-0010 > Bootloader Version = 07.26.26.219 > Driver Name = megaraid_sas > Driver Version = 07.703.05.00-rc1 > > Supported Adapter Operations : > ============================ > Rebuild Rate = Yes > CC Rate = Yes > BGI Rate = Yes > Reconstruct Rate = Yes > Patrol Read Rate = Yes > Alarm Control = Yes > Cluster Support = No > BBU = Yes > Spanning = Yes > Dedicated Hot Spare = Yes > Revertible Hot Spares = Yes > Foreign Config Import = Yes > Self Diagnostic = Yes > Allow Mixed Redundancy on Array = No > Global Hot Spares = Yes > Deny SCSI Passthrough = No > Deny SMP Passthrough = No > Deny STP Passthrough = No > Support more than 8 Phys = Yes > FW and Event Time in GMT = No > Support Enhanced Foreign Import = Yes > Support Enclosure Enumeration = Yes > Support Allowed Operations = Yes > Abort CC on Error = Yes > Support Multipath = Yes > Support Odd & Even Drive count in RAID1E = No > Support Security = No > Support Config Page Model = Yes > Support the OCE without adding drives = Yes > support EKM = No > Snapshot Enabled = No > Support PFK = Yes > Support PI = Yes > Support LDPI Type1 = No > Support LDPI Type2 = No > Support LDPI Type3 = No > Support Ld BBM Info = No > Support Shield State = Yes > Block SSD Write Disk Cache Change = Yes -> this is not good as it prevents to > change the SSD cache! Stupid! > Support Suspend Resume BG ops = Yes > Support Emergency Spares = Yes > Support Set Link Speed = Yes > Support Boot Time PFK Change = No > Support JBOD = Yes > Disable Online PFK Change = No > Support Perf Tuning = Yes > Support SSD PatrolRead = Yes > Real Time Scheduler = Yes > Support Reset Now = Yes > Support Emulated Drives = Yes > Headless Mode = Yes > Dedicated HotSpares Limited = No > Point In Time Progress = Yes > > Supported VD Operations : > ======================= > Read Policy = Yes > Write Policy = Yes > IO Policy = Yes > Access Policy = Yes > Disk Cache Policy = Yes (but only HDD’s in this case) > Reconstruction = Yes > Deny Locate = No > Deny CC = No > Allow Ctrl Encryption = No > Enable LDBBM = No > Support FastPath = Yes > Performance Metrics = Yes > Power Savings = No > Support Powersave Max With Cache = No > Support Breakmirror = No > Support SSC WriteBack = No > Support SSC Association = No > > Von: Reed Dier <reed.d...@focusvq.com> > Gesendet: Mittwoch, 02. September 2020 19:34 > An: VELARTIS Philipp Dürhammer <p.duerham...@velartis.at> > Cc: ceph-users@ceph.io > Betreff: Re: [ceph-users] Can 16 server grade ssd's be slower then 60 hdds? > (no extra journals) > > Just for the sake of curiosity, if you do a show all on /cX/vX, what is shown > for the VD properties? > VD0 Properties : > ============== > Strip Size = 256 KB > Number of Blocks = 1953374208 > VD has Emulated PD = No > Span Depth = 1 > Number of Drives Per Span = 1 > Write Cache(initial setting) = WriteBack > Disk Cache Policy = Disabled > Encryption = None > Data Protection = Disabled > Active Operations = None > Exposed to OS = Yes > Creation Date = 17-06-2016 > Creation Time = 02:49:02 PM > Emulation type = default > Cachebypass size = Cachebypass-64k > Cachebypass Mode = Cachebypass Intelligent > Is LD Ready for OS Requests = Yes > SCSI NAA Id = 600304801bb4c0001ef6ca5ea0fcb283 > > I'm wondering if the pdcache value must be set at vd creation, as it is a > creation option as well. > If that's the case, maybe consider blowing away one of the SSD vd's and > recreating the vd and OSD, and see if you can measure a difference on that > disk specifically in testing. > > It might also be helpful to document some of these values from /cX show all > > Version : > ======= > Firmware Package Build = 24.7.0-0026 > Firmware Version = 4.270.00-3972 > Bios Version = 6.22.03.0_4.16.08.00_0x060B0200 > Ctrl-R Version = 5.08-0006 > Preboot CLI Version = 01.07-05:#%0000 > NVDATA Version = 3.1411.00-0009 > Boot Block Version = 3.06.00.00-0001 > Driver Name = megaraid_sas > Driver Version = 07.703.05.00-rc1 > > Supported Adapter Operations : > ============================ > Support Shield State = Yes > Block SSD Write Disk Cache Change = Yes > Support Suspend Resume BG ops = Yes > Support Emergency Spares = Yes > Support Set Link Speed = Yes > Support Boot Time PFK Change = No > Support JBOD = Yes > > Supported VD Operations : > ======================= > Read Policy = Yes > Write Policy = Yes > IO Policy = Yes > Access Policy = Yes > Disk Cache Policy = Yes > Reconstruction = Yes > Deny Locate = No > Deny CC = No > Allow Ctrl Encryption = No > Enable LDBBM = No > Support FastPath = Yes > Performance Metrics = Yes > Power Savings = No > Support Powersave Max With Cache = No > Support Breakmirror = Yes > Support SSC WriteBack = No > Support SSC Association = No > Support VD Hide = Yes > Support VD Cachebypass = Yes > Support VD discardCacheDuringLDDelete = Yes > > > Advanced Software Option : > ======================== > > ---------------------------------------- > Adv S/W Opt Time Remaining Mode > ---------------------------------------- > MegaRAID FastPath Unlimited - > MegaRAID RAID6 Unlimited - > MegaRAID RAID5 Unlimited - > ---------------------------------------- > > > Namely, on my 3108 controller, Block SSD Write Disk Cache Change = Yes, > stands out to me. > My controller has SAS HDD's behind it, though so I just may not be running > into the same issue, that may pertain to me. > Also wondering if FastPath is enabled as well. I know on some of the older > controllers, it was a paid feature enable, but they then opened it up for > free, though you may need a software key to enable it (for free). > > Just looking to widen the net and hope we catch something. > > Reed > > > On Sep 2, 2020, at 7:38 AM, VELARTIS Philipp Dürhammer > <p.duerham...@velartis.at <mailto:p.duerham...@velartis.at>> wrote: > > I assume you are referencing this parameter? > > > storcli /c0/v0 set ssdcaching=<on|off> > > > If so, this is for CacheCade, which is LSI's cache tiering solution, which > should both be off and not in use for ceph. > > No storcli /cx/vx set pdcache=off is denied because of the lsi setting "Block > SSD Write Disk Cache Change = Yes" > I cannot find any firmware to upload or way to change this > > Do you think that disabling the write cache also on the ssd helps a lot (ceph > is not aware of this because 'smartctl -g wcache /dev/sdX shows cache > disabled - because the cache on the lsi is disabled allready) > The only way would be to buy some hba cards and add it to the server. But > that’s a lot of work - not knowing that this will improve the speed a lot. > > I am using rbd with hyperconvergenced nodes (4 at the moment) pools are 2 and > 3 times replicated. actually the performance for windows and linux vms with > the hdd osd pool was ok. But with the time getting a little bit more slow. I > just want to get ready for the future. and we plan to put some bigger > database servers on the cluster (they are on local storage at the moment) and > therefore I want to increase the random small iops of the cluster a lot > > -----Ursprüngliche Nachricht----- > Von: Reed Dier <reed.d...@focusvq.com <mailto:reed.d...@focusvq.com>> > Gesendet: Dienstag, 01. September 2020 23:44 > An: VELARTIS Philipp Dürhammer <p.duerham...@velartis.at > <mailto:p.duerham...@velartis.at>> > Cc: ceph-users@ceph.io <mailto:ceph-users@ceph.io> > Betreff: Re: [ceph-users] Can 16 server grade ssd's be slower then 60 hdds? > (no extra journals) > > > there is an option set in the controller "Block SSD Write Disk Cache Change = > Yes" which does not permit to deactivate the ssd cache. I could not find any > solution in google for this controller (LSI MegaRAID SAS 9271-8i) to change > this setting. > > > I assume you are referencing this parameter? > > storcli /c0/v0 set ssdcaching=<on|off> > > If so, this is for CacheCade, which is LSI's cache tiering solution, which > should both be off and not in use for ceph. > > Single thread and single iodepth benchmarks will tend to be underwhelming. > Ceph shines with aggregate performance from lots of clients. > And in an odd twist of fate, I typically see better performance on RBD for > random benchmarks rather than sequential benchmarks, as it distributes the > load across more OSD's. > > Might also help others offer some pointers for tuning if you describe the > pool/application a bit more. > > Ie RBD vs cephfs vs RGW, 3x replicated vs EC, etc. > > At least things are trending in a positive direction. > > Reed > > > On Sep 1, 2020, at 4:21 PM, VELARTIS Philipp Dürhammer > <p.duerham...@velartis.at <mailto:p.duerham...@velartis.at>> wrote: > > Thank you. I was working in this direction. The situation is a lot better. > But I think I can get still far better. > > I could set the controller to writethrough, direct and no read ahead for the > ssds. > But I cannot disable the pdcache ☹ there is an option set in the controller > "Block SSD Write Disk Cache Change = Yes" which does not permit to deactivate > the ssd cache. I could not find any solution in google for this controller > (LSI MegaRAID SAS 9271-8i) to change this setting. > > I don’t know how much performance gain it will be to deactivate the ssd > cache. At least the micron 5200max has capacitor so I hope it is safe for > data loss in case if power failure. I wrote a request to lsi / Broadcom if > they know how I can change this setting. This is really annyoing. > > I will check the cpu power settings. I rode also somewhere it can improve > iops a lot. (if its bad set) > > At the moment I get 600iops 4k random write 1 thread and 1 iodepth. I get 40K > - 4k random iops for some instances with 32iodepth. Its not the world but a > lot better then before. Read around 100k iops. For 16 ssd's and 2 x dual 10G > nic. > > I was reading that good tunings and hardware config can get more then 2000 > iops on single thread out of the ssds. I know thet ceph does not shine with > single thread. But 600 iops is not very much... > > philipp > > -----Ursprüngliche Nachricht----- > Von: Reed Dier <reed.d...@focusvq.com <mailto:reed.d...@focusvq.com>> > Gesendet: Dienstag, 01. September 2020 22:37 > An: VELARTIS Philipp Dürhammer <p.duerham...@velartis.at > <mailto:p.duerham...@velartis.at>> > Cc: ceph-users@ceph.io <mailto:ceph-users@ceph.io> > Betreff: Re: [ceph-users] Can 16 server grade ssd's be slower then 60 hdds? > (no extra journals) > > If using storcli/perccli for manipulating the LSI controller, you can disable > the on-disk write cache with: > storcli /cx/vx set pdcache=off > > You can also ensure that you turn off write caching at the controller level > with > storcli /cx/vx set iopolicy=direct > storcli /cx/vx set wrcache=wt > > You can also tweak the readahead value for the vd if you want, though with an > ssd, I don't think it will be much of an issue. > storcli /cx/vx set rdcache=nora > > I'm sure the megacli alternatives are available with some quick searches. > > May also want to check your c-states and p-states to make sure there isn't > any aggressive power saving features getting in the way. > > Reed > > > On Aug 31, 2020, at 7:44 AM, VELARTIS Philipp Dürhammer > <p.duerham...@velartis.at <mailto:p.duerham...@velartis.at>> wrote: > > We have older LSi Raid controller with no HBA/JBOD option. So we expose the > single disks as raid0 devices. Ceph should not be aware of cache status? > But digging deeper in to it it seems that 1 out of 4 serves is performing a > lot better and has super low commit/applay rates while the other have a lot > mor (20+) on heavy writes. This just applys fore the ssd. For the hdds I cant > see a difference... > > -----Ursprüngliche Nachricht----- > Von: Frank Schilder <fr...@dtu.dk <mailto:fr...@dtu.dk>> > Gesendet: Montag, 31. August 2020 13:19 > An: VELARTIS Philipp Dürhammer <p.duerham...@velartis.at > <mailto:p.duerham...@velartis.at>>; 'ceph-users@ceph.io > <mailto:ceph-users@ceph.io>' <ceph-users@ceph.io <mailto:ceph-users@ceph.io>> > Betreff: Re: Can 16 server grade ssd's be slower then 60 hdds? (no extra > journals) > > Yes, they can - if volatile write cache is not disabled. There are many > threads on this, also recent. Search for "disable write cache" and/or > "disable volatile write cache". > > You will also find different methods of doing this automatically. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: VELARTIS Philipp Dürhammer <p.duerham...@velartis.at > <mailto:p.duerham...@velartis.at>> > Sent: 31 August 2020 13:02:45 > To: 'ceph-users@ceph.io <mailto:ceph-users@ceph.io>' > Subject: [ceph-users] Can 16 server grade ssd's be slower then 60 hdds? (no > extra journals) > > I have a productive 60 osd's cluster. No extra Journals. Its performing okay. > Now I added an extra ssd Pool with 16 Micron 5100 MAX. And the performance is > little slower or equal to the 60 hdd pool. 4K random as also sequential > reads. All on dedicated 2 times 10G Network. HDDS are still on filestore. SSD > on bluestore. Ceph Luminous. > What should be possible 16 ssd's vs. 60 hhd's no extra journals? > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io <mailto:ceph-users@ceph.io> To > unsubscribe send an email to ceph-users-le...@ceph.io > <mailto:ceph-users-le...@ceph.io> > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io <mailto:ceph-users@ceph.io> > To unsubscribe send an email to ceph-users-le...@ceph.io > <mailto:ceph-users-le...@ceph.io>
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io