Re: [ceph-users] Disk write cache - safe?

Frédéric Nass Mon, 19 Mar 2018 03:15:07 -0700

Hi Steven,

Le 16/03/2018 à 17:26, Steven Vacaroaia a écrit :

Hi All,
Can someone confirm please that, for a perfect performance/safetycompromise, the following would be the best settings ( id 0 is SSD,id 1 is HDD )Alternatively, any suggestions / sharing configuration / advice wouldbe greatly appreciated
Note
server is a DELL R620 with PERC 710 , 1GB cache
SSD is entreprise Toshiba PX05SMB040Y
HDD is Entreprise Seagate  ST600MM0006


 megacli -LDGetProp  -DskCache -Lall -a0

Adapter 0-VD 0(target id: 0): Disk Write Cache : Enabled
Adapter 0-VD 1(target id: 1): Disk Write Cache : Disabled

Sounds good to me as Toshiba PX05SMB040Y SSDs include power-lossprotection(https://toshiba.semicon-storage.com/eu/product/storage-products/enterprise-ssd/px05smbxxx.html)

megacli -LDGetProp  -Cache -Lall -a0
Adapter 0-VD 0(target id: 0): Cache Policy:WriteBack, ReadAdaptive,Direct, No Write Cache if bad BBUAdapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive,Cached, Write Cache OK if bad BBU

I've always wondered about ReadAdaptive with no real answer. This wouldneed clarification from RHCS / Ceph performance team.

With a 1GB PERC cache, my guess is that you should set SSDs towritethrough whatever your workload is, so that the whole cache isdedicated to HDDs only, and your nodes don't hit a PERC cache full issuethat would be hard to diagnose. Besides, write caching should always beavoided with a bad BBU.


Regards,

Frédéric.

Many thanks

Steven

On 16 March 2018 at 06:20, Frédéric Nass<frederic.n...@univ-lorraine.fr<mailto:frederic.n...@univ-lorraine.fr>> wrote:


    Hi Tim,

    I wanted to share our experience here as we've been in a situation
    in the past (on a friday afternoon of course...) that injecting a
    snaptrim priority of 40 to all OSDs in the cluster (to speed up
    snaptimming) resulted in alls OSD nodes crashing at the same time,
    in all 3 datacenters. My first thought at that particular moment
    was : call your wife and tell her you'll be late home. :-D

    And this event was not related to a power outage.

    Fortunately I had spent some time (when building the cluster)
    thinking how each option should be set along the I/O path for #1
    data consistency and #2 best possible performance, and that was :

    - Single SATA disks Raid0 with writeback PERC caching on each
    virtual disk
    - write barriers kept enabled on XFS mounts (I had measured a 1.5
    % performance gap so disabling warriers was no good choice, and is
    never actually)
    - SATA disks write buffer disabled (as volatile)
    - SSD journal disks write buffer enabled (as persistent)

    We hardly believed it but when all nodes came back online, all
    OSDs rejoined the cluster and service was back as it was before.
    We didn't face any XFS errors nor did we have any further scrub or
    deep-scrub errors.

    My assumption was that the extra power demand for snaptrimimng may
    have led to node power instability or that we hit a SATA firmware
    or maybe a kernel bug.

    We also had SSDs as Raid0 with writeback PERC cache ON but changed
    that to write-through as we could get more IOPS from them
    regarding our workloads.

    Thanks for sharing the information about DELL changing the default
    disk buffer policy. What's odd is that it all buffers were
    disabled after the node rebooted, including SSDs !
    I am now changing them back to enabled for SSDs only.

    As said by others, you'd better keep the disks buffers disabled
    and rebuild the OSDs after setting the disks as Raid0 with
    writeback enabled.

    Best,

    Frédéric.

    Le 14/03/2018 à 20:42, Tim Bishop a écrit :

        I'm using Ceph on Ubuntu 16.04 on Dell R730xd servers. A
        recent [1]
        update to the PERC firmware disabled the disk write cache by
        default
        which made a noticable difference to the latency on my disks
        (spinning
        disks, not SSD) - by as much as a factor of 10.

        For reference their change list says:

        "Changes default value of drive cache for 6 Gbps SATA drive to
        disabled.
        This is to align with the industry for SATA drives. This may
        result in a
        performance degradation especially in non-Raid mode. You must
        perform an
        AC reboot to see existing configurations change."

        It's fairly straightforward to re-enable the cache either in
        the PERC
        BIOS, or by using hdparm, and doing so returns the latency
        back to what
        it was before.

        Checking the Ceph documentation I can see that older versions [2]
        recommended disabling the write cache for older kernels. But
        given I'm
        using a newer kernel, and there's no mention of this in the
        Luminous
        docs, is it safe to assume it's ok to enable the disk write
        cache now?

        If it makes a difference, I'm using a mixture of filestore and
        bluestore
        OSDs - migration is still ongoing.

        Thanks,

        Tim.

        [1] -
        
https://www.dell.com/support/home/uk/en/ukdhs1/Drivers/DriversDetails?driverId=8WK8N
        
<https://www.dell.com/support/home/uk/en/ukdhs1/Drivers/DriversDetails?driverId=8WK8N>
        [2] -
        
http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/
        
<http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/>


    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk write cache - safe?

Reply via email to