Hi All, Can someone confirm please that, for a perfect performance/safety compromise, the following would be the best settings ( id 0 is SSD, id 1 is HDD ) Alternatively, any suggestions / sharing configuration / advice would be greatly appreciated
Note server is a DELL R620 with PERC 710 , 1GB cache SSD is entreprise Toshiba PX05SMB040Y HDD is Entreprise Seagate ST600MM0006 megacli -LDGetProp -DskCache -Lall -a0 Adapter 0-VD 0(target id: 0): Disk Write Cache : Enabled Adapter 0-VD 1(target id: 1): Disk Write Cache : Disabled megacli -LDGetProp -Cache -Lall -a0 Adapter 0-VD 0(target id: 0): Cache Policy:WriteBack, ReadAdaptive, Direct, No Write Cache if bad BBU Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive, Cached, Write Cache OK if bad BBU Many thanks Steven On 16 March 2018 at 06:20, Frédéric Nass <frederic.n...@univ-lorraine.fr> wrote: > Hi Tim, > > I wanted to share our experience here as we've been in a situation in the > past (on a friday afternoon of course...) that injecting a snaptrim > priority of 40 to all OSDs in the cluster (to speed up snaptimming) > resulted in alls OSD nodes crashing at the same time, in all 3 datacenters. > My first thought at that particular moment was : call your wife and tell > her you'll be late home. :-D > > And this event was not related to a power outage. > > Fortunately I had spent some time (when building the cluster) thinking how > each option should be set along the I/O path for #1 data consistency and #2 > best possible performance, and that was : > > - Single SATA disks Raid0 with writeback PERC caching on each virtual disk > - write barriers kept enabled on XFS mounts (I had measured a 1.5 % > performance gap so disabling warriers was no good choice, and is never > actually) > - SATA disks write buffer disabled (as volatile) > - SSD journal disks write buffer enabled (as persistent) > > We hardly believed it but when all nodes came back online, all OSDs > rejoined the cluster and service was back as it was before. We didn't face > any XFS errors nor did we have any further scrub or deep-scrub errors. > > My assumption was that the extra power demand for snaptrimimng may have > led to node power instability or that we hit a SATA firmware or maybe a > kernel bug. > > We also had SSDs as Raid0 with writeback PERC cache ON but changed that to > write-through as we could get more IOPS from them regarding our workloads. > > Thanks for sharing the information about DELL changing the default disk > buffer policy. What's odd is that it all buffers were disabled after the > node rebooted, including SSDs ! > I am now changing them back to enabled for SSDs only. > > As said by others, you'd better keep the disks buffers disabled and > rebuild the OSDs after setting the disks as Raid0 with writeback enabled. > > Best, > > Frédéric. > > Le 14/03/2018 à 20:42, Tim Bishop a écrit : > >> I'm using Ceph on Ubuntu 16.04 on Dell R730xd servers. A recent [1] >> update to the PERC firmware disabled the disk write cache by default >> which made a noticable difference to the latency on my disks (spinning >> disks, not SSD) - by as much as a factor of 10. >> >> For reference their change list says: >> >> "Changes default value of drive cache for 6 Gbps SATA drive to disabled. >> This is to align with the industry for SATA drives. This may result in a >> performance degradation especially in non-Raid mode. You must perform an >> AC reboot to see existing configurations change." >> >> It's fairly straightforward to re-enable the cache either in the PERC >> BIOS, or by using hdparm, and doing so returns the latency back to what >> it was before. >> >> Checking the Ceph documentation I can see that older versions [2] >> recommended disabling the write cache for older kernels. But given I'm >> using a newer kernel, and there's no mention of this in the Luminous >> docs, is it safe to assume it's ok to enable the disk write cache now? >> >> If it makes a difference, I'm using a mixture of filestore and bluestore >> OSDs - migration is still ongoing. >> >> Thanks, >> >> Tim. >> >> [1] - https://www.dell.com/support/home/uk/en/ukdhs1/Drivers/Drive >> rsDetails?driverId=8WK8N >> [2] - http://docs.ceph.com/docs/jewel/rados/configuration/filesyst >> em-recommendations/ >> >> > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com