Re: [ceph-users] Disk write cache - safe?

Steven Vacaroaia Fri, 16 Mar 2018 09:27:07 -0700

Hi All,

Can someone confirm please that, for a perfect  performance/safety
compromise, the following would be the best settings  ( id 0 is SSD, id 1
is HDD )
Alternatively, any suggestions / sharing configuration / advice would be
greatly appreciated


Note
server is a DELL R620 with PERC 710 , 1GB cache
SSD is entreprise Toshiba PX05SMB040Y
HDD is Entreprise Seagate  ST600MM0006


 megacli -LDGetProp  -DskCache -Lall -a0

Adapter 0-VD 0(target id: 0): Disk Write Cache : Enabled
Adapter 0-VD 1(target id: 1): Disk Write Cache : Disabled

megacli -LDGetProp  -Cache -Lall -a0

Adapter 0-VD 0(target id: 0): Cache Policy:WriteBack, ReadAdaptive, Direct,
No Write Cache if bad BBU
Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive, Cached,
Write Cache OK if bad BBU

Many thanks

Steven





On 16 March 2018 at 06:20, Frédéric Nass <frederic.n...@univ-lorraine.fr>
wrote:

> Hi Tim,
>
> I wanted to share our experience here as we've been in a situation in the
> past (on a friday afternoon of course...) that injecting a snaptrim
> priority of 40 to all OSDs in the cluster (to speed up snaptimming)
> resulted in alls OSD nodes crashing at the same time, in all 3 datacenters.
> My first thought at that particular moment was : call your wife and tell
> her you'll be late home. :-D
>
> And this event was not related to a power outage.
>
> Fortunately I had spent some time (when building the cluster) thinking how
> each option should be set along the I/O path for #1 data consistency and #2
> best possible performance, and that was :
>
> - Single SATA disks Raid0 with writeback PERC caching on each virtual disk
> - write barriers kept enabled on XFS mounts (I had measured a 1.5 %
> performance gap so disabling warriers was no good choice, and is never
> actually)
> - SATA disks write buffer disabled (as volatile)
> - SSD journal disks write buffer enabled (as persistent)
>
> We hardly believed it but when all nodes came back online, all OSDs
> rejoined the cluster and service was back as it was before. We didn't face
> any XFS errors nor did we have any further scrub or deep-scrub errors.
>
> My assumption was that the extra power demand for snaptrimimng may have
> led to node power instability or that we hit a SATA firmware or maybe a
> kernel bug.
>
> We also had SSDs as Raid0 with writeback PERC cache ON but changed that to
> write-through as we could get more IOPS from them regarding our workloads.
>
> Thanks for sharing the information about DELL changing the default disk
> buffer policy. What's odd is that it all buffers were disabled after the
> node rebooted, including SSDs !
> I am now changing them back to enabled for SSDs only.
>
> As said by others, you'd better keep the disks buffers disabled and
> rebuild the OSDs after setting the disks as Raid0 with writeback enabled.
>
> Best,
>
> Frédéric.
>
> Le 14/03/2018 à 20:42, Tim Bishop a écrit :
>
>> I'm using Ceph on Ubuntu 16.04 on Dell R730xd servers. A recent [1]
>> update to the PERC firmware disabled the disk write cache by default
>> which made a noticable difference to the latency on my disks (spinning
>> disks, not SSD) - by as much as a factor of 10.
>>
>> For reference their change list says:
>>
>> "Changes default value of drive cache for 6 Gbps SATA drive to disabled.
>> This is to align with the industry for SATA drives. This may result in a
>> performance degradation especially in non-Raid mode. You must perform an
>> AC reboot to see existing configurations change."
>>
>> It's fairly straightforward to re-enable the cache either in the PERC
>> BIOS, or by using hdparm, and doing so returns the latency back to what
>> it was before.
>>
>> Checking the Ceph documentation I can see that older versions [2]
>> recommended disabling the write cache for older kernels. But given I'm
>> using a newer kernel, and there's no mention of this in the Luminous
>> docs, is it safe to assume it's ok to enable the disk write cache now?
>>
>> If it makes a difference, I'm using a mixture of filestore and bluestore
>> OSDs - migration is still ongoing.
>>
>> Thanks,
>>
>> Tim.
>>
>> [1] - https://www.dell.com/support/home/uk/en/ukdhs1/Drivers/Drive
>> rsDetails?driverId=8WK8N
>> [2] - http://docs.ceph.com/docs/jewel/rados/configuration/filesyst
>> em-recommendations/
>>
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk write cache - safe?

Reply via email to