Re: [ceph-users] Disk write cache - safe?

Joe Comeau Thu, 15 Mar 2018 15:12:12 -0700

 
 
After reading Reeds comments about losing power to his data center, I
think he brings up a lot of good points.
So take Dells advice I linked into consideration with your own
environment
 
We also have 8TB disks with Intel P3700  for journal
Our large ups and new generators which are tested weekly,are great...
but now we will have to test ourselves what if the generators do not
start.....

Joe

>>> Reed Dier <reed.d...@focusvq.com> 3/14/2018 1:55 PM >>>
Tim, 

I can corroborate David’s sentiments as it pertains to being a
disaster.

In the early days of my Ceph cluster, I had 8TB SAS drives behind an
LSI RAID controller as RAID0 volumes (no IT mode), with on-drive
write-caching enabled (pdcache=default). I subsequently had my the data
center where this was colocated struck by lightning and grid power
interrupted with the generators failing to start, so when the UPS for
the DC went, so did my cluster. Most of my issues were related to xfs
file system errors.

Luckily, I was bitten before I had important data on Ceph, mostly
CephFS, but everything was lost.
It was a painful, but extremely helpful learning experience.

I was able to recreate the osd failure with power pulls to nodes,
narrowing my issues to the pdcache.
I was then able to add BBU’s to the RAID cards, and enable write-back
to improve write performance, making my disks fault tolerant while still
keeping write perf increased. And when the BBU is failed, I have it
configured to revert to write-through, which I have confirmed is
tolerant.

I have later upgraded these drives to bluestore, and did a power pull
on a single node to verify integrity, which I was able to do.

Worth mentioning that my 8TB SAS spinners were journaled/are block.db’d
by an Intel P3700 NVMe disk which advertises "Enhanced power-loss data
protection” which appears to come in the form of a capacitor in the NVMe
card to keep writes from being flushed during power loss.

tl;dr steer clear of on-disk write caching where possible unless you
can guarantee never losing power.

Reed

> On Mar 14, 2018, at 3:08 PM, David Byte <db...@suse.com> wrote:
> 
> Tim,
> 
> Enabling the drive write cache is a recipe for disaster.  In the
event of a power interruption, you have in-flight data that is stored in
the cache and uncommitted to the disk media itself.  Being that the
power is interrupted and the drive cache does not have a battery or
supercap to keep it powered, you end up losing the data in the cache. 
Now, if this is just a single node and you have size=3 or a decent EC
scheme in place, Ceph should be able to recover and keep going. 
However, if it is more than 1 node that loses power, you start running
the risk of corrupting multiple or dare I say *all* copies of the data
that was supposed to be written, with the result being data loss.  This
is why is it the standard practice to disable drive caches, not just
with Ceph, but with any enterprise storage offering.
> 
> In testing that I've done, using a battery backed cache on the RAID
controller with each drive as it's own RAID-0 has positive performance
results.  This is something to try and see if you can regain some of the
performance, but as always in storage, YMMV.
> 
> David Byte
> Sr. Technology Strategist
> SCE Enterprise Linux 
> SCE Enterprise Storage
> Alliances and SUSE Embedded
> db...@suse.com
> 918.528.4422
> On 3/14/18, 2:43 PM, "ceph-users on behalf of Tim Bishop"
<ceph-users-boun...@lists.ceph.com on behalf of tim-li...@bishnet.net>
wrote:
> 
>    I'm using Ceph on Ubuntu 16.04 on Dell R730xd servers. A recent
[1]
>    update to the PERC firmware disabled the disk write cache by
default
>    which made a noticable difference to the latency on my disks
(spinning
>    disks, not SSD) - by as much as a factor of 10.
> 
>    For reference their change list says:
> 
>    "Changes default value of drive cache for 6 Gbps SATA drive to
disabled.
>    This is to align with the industry for S
ATA drives. This may
result in a
>    performance degradation especially in non-Raid mode. You must
perform an
>    AC reboot to see existing configurations change."
> 
>    It's fairly straightforward to re-enable the cache either in the
PERC
>    BIOS, or by using hdparm, and doing so returns the latency back to
what
>    it was before.
> 
>    Checking the Ceph documentation I can see that older versions [2]
>    recommended disabling the write cache for older kernels. But given
I'm
>    using a newer kernel, and there's no mention of this in the
Luminous
>    docs, is it safe to assume it's ok to enable the disk write cache
now?
> 
>    If it makes a difference, I'm using a mixture of filestore and
bluestore
>    OSDs - migration is still ongoing.
> 
>    Thanks,
> 
>    Tim.
> 
>    [1] -
https://www.dell.com/support/home/uk/en/ukdhs1/Drivers/DriversDetails?driverId=8WK8N
>    [2] -
http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/
> 
>    -- 
>    Tim Bishop
>    http://www.bishnet.net/tim/
>    PGP Key: 0x6C226B37FDF38D55
> 
>    _______________________________________________
>    ceph-users mailing list
>    ceph-users@lists.ceph.com
>    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk write cache - safe?

Reply via email to